🤖 AI Summary
This study addresses the challenges of early wildfire detection from satellite remote sensing, including weak smoke signals, complex meteorological interference, and the need for large-scale real-time analysis. The work proposes the first end-to-end AI system that integrates high-precision object detection (YOLOv12) with a multimodal large language model (MLLM) to jointly perform smoke and fire-spot detection, context-aware risk assessment, and generation of actionable emergency recommendations. By establishing a closed-loop pipeline from visual perception to semantic risk reasoning, the system enables interpretable decision-making for disaster response. Built on a service-oriented architecture and validated on real satellite data, the approach demonstrates high-quality risk assessments according to LLM-as-judge evaluation, supporting real-time alerts, interactive visualization dashboards, and long-term fire monitoring.
📝 Abstract
Wildfires are a growing threat to ecosystems, human lives, and infrastructure, with their frequency and intensity rising due to climate change and human activities. Early detection is critical, yet satellite-based monitoring remains challenging due to faint smoke signals, dynamic weather conditions, and the need for real-time analysis over large areas. We introduce WildfireVLM, an AI framework that combines satellite imagery wildfire detection with language-driven risk assessment. We construct a labeled wildfire and smoke dataset using imagery from Landsat-8/9, GOES-16, and other publicly available Earth observation sources, including harmonized products with aligned spectral bands. WildfireVLM employs YOLOv12 to detect fire zones and smoke plumes, leveraging its ability to detect small, complex patterns in satellite imagery. We integrate Multimodal Large Language Models (MLLMs) that convert detection outputs into contextualized risk assessments and prioritized response recommendations for disaster management. We validate the quality of risk reasoning using an LLM-as-judge evaluation with a shared rubric. The system is deployed using a service-oriented architecture that supports real-time processing, visual risk dashboards, and long-term wildfire tracking, demonstrating the value of combining computer vision with language-based reasoning for scalable wildfire monitoring.