Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the challenges of scarce and dynamically varying annotated data in unmanned aerial vehicle (UAV) scenarios, where existing layout-to-image generation methods often produce artifacts near the boundaries of small foreground objects, degrading detection performance. To mitigate this, the authors propose UAVGen, a novel framework that integrates a category-level visual prototype-guided conditional diffusion model (VPC-DM) with a focus-region-enhanced data pipeline (FRE-DP) to jointly improve the fidelity of generated small targets and suppress layout-boundary artifacts. Additionally, a label refinement technique is introduced to enhance the quality of synthetic annotations. Experimental results demonstrate that the synthesized data consistently boosts detection accuracy across multiple detectors, confirming the framework’s effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract

Unmanned aerial vehicle (UAV) based object detection is a critical but challenging task, when applied in dynamically changing scenarios with limited annotated training data. Layout-to-image generation approaches have proved effective in promoting detection accuracy by synthesizing labeled images based on diffusion models. However, they suffer from frequently producing artifacts, especially near layout boundaries of tiny objects, thus substantially limiting their performance. To address these issues, we propose UAVGen, a novel layout-to-image generation framework tailored for UAV-based object detection. Specifically, UAVGen designs a Visual Prototype Conditioned Diffusion Model (VPC-DM) that constructs representative instances for each class and integrates them into latent embeddings for high-fidelity object generation. Moreover, a Focal Region Enhanced Data Pipeline (FRE-DP) is introduced to emphasize object-concentrated foreground regions in synthesis, combined with a label refinement to correct missing, extra and misaligned generations. Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art approaches, and consistently promotes accuracy when integrated with distinct detectors. The source code is available at https://github.com/Sirius-Li/UAVGen.

Problem

Research questions and friction points this paper is trying to address.

UAV-based object detection

layout-to-image generation

artifact reduction

limited annotated data

tiny object detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Prototype Conditioned Diffusion Model

Focal Region Enhanced Data Pipeline

Layout-to-Image Generation