Class-specific diffusion models improve military object detection in a low-data domain

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
This work addresses the challenge of military object detection under extreme data scarcity—specifically, when only 8 or 24 real images per class are available. The authors propose a novel paradigm that fine-tunes the FLUX.1 diffusion model using LoRA to create class-specific generators, synthesizing training data via automatically generated text prompts. To control object pose and viewpoint, they integrate ControlNet with edge-map guidance. This approach uniquely combines class-specific diffusion-based generation with structural constraints, requiring only minimal real data. Evaluated on an RF-DETR detector, the method achieves up to an 8.0% mAP50 gain with just eight real images per class; incorporating ControlNet further improves performance by 4.1%, substantially outperforming conventional simulation-based pipelines.

Technology Category

Application Category

📝 Abstract
Diffusion-based image synthesis has emerged as a promising source of synthetic training data for AI-based object detection and classification. In this work, we investigate whether images generated with diffusion can improve military vehicle detection under low-data conditions. We fine-tuned the text-to-image diffusion model FLUX.1 [dev] using LoRA with only 8 or 24 real images per class across 15 vehicle categories, resulting in class-specific diffusion models, which were used to generate new samples from automatically generated text prompts. The same real images were used to fine-tune the RF-DETR detector for a 15-class object detection task. Synthetic datasets generated by the diffusion models were then used to further improve detector performance. Importantly, no additional real data was required, as the generative models leveraged the same limited training samples. FLUX-generated images improved detection performance, particularly in the low-data regime (up to +8.0% mAP$_{50}$ with 8 real samples). To address the limited geometric control of text prompt-based diffusion, we additionally generated structurally guided synthetic data using ControlNet with Canny edge-map conditioning, yielding a FLUX-ControlNet (FLUX-CN) dataset with explicit control over viewpoint and pose. Structural guidance further enhanced performance when data is scarce (+4.1% mAP$_{50}$ with 8 real samples), but no additional benefit was observed when more real data is available. This study demonstrates that object-specific diffusion models are effective for improving military object detection in a low-data domain, and that structural guidance is most beneficial when real data is highly limited. These results highlight generative image data as an alternative to traditional simulation pipelines for the training of military AI systems.
Problem

Research questions and friction points this paper is trying to address.

low-data domain
military object detection
diffusion models
synthetic data
object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

class-specific diffusion models
low-data object detection
ControlNet
synthetic data generation
military vehicle detection
🔎 Similar Papers
No similar papers found.
E
Ella P. Fokkinga
TNO - Intelligent Imaging, Oude Waalsdorperweg 63, the Hague, the Netherlands
J
Jan Erik van Woerden
TNO - Intelligent Imaging, Oude Waalsdorperweg 63, the Hague, the Netherlands
T
Thijs A. Eker
TNO - Intelligent Imaging, Oude Waalsdorperweg 63, the Hague, the Netherlands
S
Sebastiaan P. Snel
TNO - Intelligent Imaging, Oude Waalsdorperweg 63, the Hague, the Netherlands
E
Elfi I. S. Hofmeijer
TNO - Intelligent Imaging, Oude Waalsdorperweg 63, the Hague, the Netherlands
Klamer Schutte
Klamer Schutte
TNO, Intelligent Imaging
Artificial intelligenceimage processingcomputer vision
F
Friso G. Heslinga
TNO - Intelligent Imaging, Oude Waalsdorperweg 63, the Hague, the Netherlands