Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection

πŸ“… 2026-04-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenges of scarce and dynamically varying annotated data in unmanned aerial vehicle (UAV) scenarios, where existing layout-to-image generation methods often produce artifacts near the boundaries of small foreground objects, degrading detection performance. To mitigate this, the authors propose UAVGen, a novel framework that integrates a category-level visual prototype-guided conditional diffusion model (VPC-DM) with a focus-region-enhanced data pipeline (FRE-DP) to jointly improve the fidelity of generated small targets and suppress layout-boundary artifacts. Additionally, a label refinement technique is introduced to enhance the quality of synthetic annotations. Experimental results demonstrate that the synthesized data consistently boosts detection accuracy across multiple detectors, confirming the framework’s effectiveness and generalization capability.
πŸ“ Abstract
Unmanned aerial vehicle (UAV) based object detection is a critical but challenging task, when applied in dynamically changing scenarios with limited annotated training data. Layout-to-image generation approaches have proved effective in promoting detection accuracy by synthesizing labeled images based on diffusion models. However, they suffer from frequently producing artifacts, especially near layout boundaries of tiny objects, thus substantially limiting their performance. To address these issues, we propose UAVGen, a novel layout-to-image generation framework tailored for UAV-based object detection. Specifically, UAVGen designs a Visual Prototype Conditioned Diffusion Model (VPC-DM) that constructs representative instances for each class and integrates them into latent embeddings for high-fidelity object generation. Moreover, a Focal Region Enhanced Data Pipeline (FRE-DP) is introduced to emphasize object-concentrated foreground regions in synthesis, combined with a label refinement to correct missing, extra and misaligned generations. Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art approaches, and consistently promotes accuracy when integrated with distinct detectors. The source code is available at https://github.com/Sirius-Li/UAVGen.
Problem

Research questions and friction points this paper is trying to address.

UAV-based object detection
layout-to-image generation
artifact reduction
limited annotated data
tiny object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Prototype Conditioned Diffusion Model
Focal Region Enhanced Data Pipeline
Layout-to-Image Generation
UAV-Based Object Detection
High-Fidelity Object Synthesis
πŸ”Ž Similar Papers
No similar papers found.
W
Wenhao Li
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China; School of Computer Science and Engineering, Beihang University, Beijing, China
Z
Zimeng Wu
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China; School of Computer Science and Engineering, Beihang University, Beijing, China
Yu Wu
Yu Wu
University of Cambridge
machine learninghealth sensingmobile health
Z
Zehua Fu
Hangzhou Innovation Institute, Beihang University, Hangzhou, China
J
Jiaxin Chen
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China; School of Computer Science and Engineering, Beihang University, Beijing, China