π€ AI Summary
To address the scarcity of real-world forest fire smoke imagery and the smoke-background inconsistency prevalent in existing synthesis methods, this paper proposes a mask-guided smoke image synthesis framework. The method fuses smoke masks with background image features, introduces a mask-randomized differential loss to enhance spatial and semantic consistency, and leverages a multimodal large language model to filter high-fidelity synthetic samples. Technically, it integrates a pre-trained segmentation model for precise mask generation, a diffusion model to characterize smoke distribution, and a feature fusion architecture for fine-grained synthesis. The resulting synthetic dataset significantly improves downstream smoke detection performance, yielding up to an 8.2% mAP gain across multiple benchmarks. This effectively alleviates the bottleneck of insufficient real smoke samples and provides robust, scalable data support for intelligent wildfire monitoring inιε€ environments.
π Abstract
Smoke is the first visible indicator of a wildfire.With the advancement of deep learning, image-based smoke detection has become a crucial method for detecting and preventing forest fires. However, the scarcity of smoke image data from forest fires is one of the significant factors hindering the detection of forest fire smoke. Image generation models offer a promising solution for synthesizing realistic smoke images. However, current inpainting models exhibit limitations in generating high-quality smoke representations, particularly manifesting as inconsistencies between synthesized smoke and background contexts. To solve these problems, we proposed a comprehensive framework for generating forest fire smoke images. Firstly, we employed the pre-trained segmentation model and the multimodal model to obtain smoke masks and image captions.Then, to address the insufficient utilization of masks and masked images by inpainting models, we introduced a network architecture guided by mask and masked image features. We also proposed a new loss function, the mask random difference loss, which enhances the consistency of the generated effects around the mask by randomly expanding and eroding the mask edges.Finally, to generate a smoke image dataset using random masks for subsequent detection tasks, we incorporated smoke characteristics and use a multimodal large language model as a filtering tool to select diverse and reasonable smoke images, thereby improving the quality of the synthetic dataset. Experiments showed that our generated smoke images are realistic and diverse, and effectively enhance the performance of forest fire smoke detection models. Code is available at https://github.com/wghr123/MFGDiffusion.