🤖 AI Summary
Existing wildfire modeling approaches are constrained by coarse spatiotemporal scales and low-resolution remote sensing data, limiting high-fidelity localized dynamic simulation. To address this, we propose FiReDiff—a novel bimodal prediction paradigm—accompanied by a provincial-scale, sub-meter spatial and sub-second temporal resolution multimodal dataset integrating UAV-captured visible/infrared video sequences, environmental sensor measurements, and expert-annotated fire mask ground truth. Methodologically, we introduce a two-stage pipeline: first generating high-fidelity infrared video frames via diffusion modeling, then performing semantic segmentation to extract fire region masks—thereby overcoming the limitations of end-to-end mask prediction and enhancing temporal coherence and physical interpretability. Experiments demonstrate significant improvements: 50.0% reduction in LPIPS (indicating superior video reconstruction quality), and 59.1% and 42.9% gains in fire segmentation F1-score and IoU, respectively. This advances fine-grained wildfire spread forecasting and supports more effective emergency response decision-making.
📝 Abstract
Fine-grained wildfire spread prediction is crucial for enhancing emergency response efficacy and decision-making precision. However, existing research predominantly focuses on coarse spatiotemporal scales and relies on low-resolution satellite data, capturing only macroscopic fire states while fundamentally constraining high-precision localized fire dynamics modeling capabilities. To bridge this gap, we present FireSentry, a provincial-scale multi-modal wildfire dataset characterized by sub-meter spatial and sub-second temporal resolution. Collected using synchronized UAV platforms, FireSentry provides visible and infrared video streams, in-situ environmental measurements, and manually validated fire masks. Building on FireSentry, we establish a comprehensive benchmark encompassing physics-based, data-driven, and generative models, revealing the limitations of existing mask-only approaches. Our analysis proposes FiReDiff, a novel dual-modality paradigm that first predicts future video sequences in the infrared modality, and then precisely segments fire masks in the mask modality based on the generated dynamics. FiReDiff achieves state-of-the-art performance, with video quality gains of 39.2% in PSNR, 36.1% in SSIM, 50.0% in LPIPS, 29.4% in FVD, and mask accuracy gains of 3.3% in AUPRC, 59.1% in F1 score, 42.9% in IoU, and 62.5% in MSE when applied to generative models. The FireSentry benchmark dataset and FiReDiff paradigm collectively advance fine-grained wildfire forecasting and dynamic disaster simulation. The processed benchmark dataset is publicly available at: https://github.com/Munan222/FireSentry-Benchmark-Dataset.