Class-specific diffusion models improve military object detection in a low-data domain

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenge of military object detection under extreme data scarcity—specifically, when only 8 or 24 real images per class are available. The authors propose a novel paradigm that fine-tunes the FLUX.1 diffusion model using LoRA to create class-specific generators, synthesizing training data via automatically generated text prompts. To control object pose and viewpoint, they integrate ControlNet with edge-map guidance. This approach uniquely combines class-specific diffusion-based generation with structural constraints, requiring only minimal real data. Evaluated on an RF-DETR detector, the method achieves up to an 8.0% mAP50 gain with just eight real images per class; incorporating ControlNet further improves performance by 4.1%, substantially outperforming conventional simulation-based pipelines.

Technology Category

Application Category

📝 Abstract

Diffusion-based image synthesis has emerged as a promising source of synthetic training data for AI-based object detection and classification. In this work, we investigate whether images generated with diffusion can improve military vehicle detection under low-data conditions. We fine-tuned the text-to-image diffusion model FLUX.1 [dev] using LoRA with only 8 or 24 real images per class across 15 vehicle categories, resulting in class-specific diffusion models, which were used to generate new samples from automatically generated text prompts. The same real images were used to fine-tune the RF-DETR detector for a 15-class object detection task. Synthetic datasets generated by the diffusion models were then used to further improve detector performance. Importantly, no additional real data was required, as the generative models leveraged the same limited training samples. FLUX-generated images improved detection performance, particularly in the low-data regime (up to +8.0% mAP$_{50}$ with 8 real samples). To address the limited geometric control of text prompt-based diffusion, we additionally generated structurally guided synthetic data using ControlNet with Canny edge-map conditioning, yielding a FLUX-ControlNet (FLUX-CN) dataset with explicit control over viewpoint and pose. Structural guidance further enhanced performance when data is scarce (+4.1% mAP$_{50}$ with 8 real samples), but no additional benefit was observed when more real data is available. This study demonstrates that object-specific diffusion models are effective for improving military object detection in a low-data domain, and that structural guidance is most beneficial when real data is highly limited. These results highlight generative image data as an alternative to traditional simulation pipelines for the training of military AI systems.

Problem

Research questions and friction points this paper is trying to address.

low-data domain

military object detection

diffusion models

synthetic data

object detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

class-specific diffusion models

low-data object detection

ControlNet