Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the domain shift problem—characterized by weak cross-region generalization in aerial vehicle detection due to environmental, urban layout, and imaging condition variations—this paper proposes a multi-stage, multimodal knowledge transfer framework. It leverages a fine-tuned latent diffusion model (LDM) to synthesize high-fidelity remote sensing imagery and corresponding pseudo-labels, integrated with generative data augmentation and weakly supervised cross-domain adaptation for end-to-end detection optimization without dense annotations. The core contribution is a generative-AI-driven, weakly supervised domain alignment mechanism that significantly reduces the distribution discrepancy between source and target domains. Extensive experiments on multiple public benchmarks demonstrate consistent improvements, with AP₅₀ gains of 4–23% over state-of-the-art supervised, weakly supervised, unsupervised domain adaptation, and open-set detection methods. Additionally, two newly annotated aerial image datasets—covering New Zealand and Utah—are publicly released to support future research.

Technology Category

Application Category

📝 Abstract
Detecting vehicles in aerial imagery is a critical task with applications in traffic monitoring, urban planning, and defense intelligence. Deep learning methods have provided state-of-the-art (SOTA) results for this application. However, a significant challenge arises when models trained on data from one geographic region fail to generalize effectively to other areas. Variability in factors such as environmental conditions, urban layouts, road networks, vehicle types, and image acquisition parameters (e.g., resolution, lighting, and angle) leads to domain shifts that degrade model performance. This paper proposes a novel method that uses generative AI to synthesize high-quality aerial images and their labels, improving detector training through data augmentation. Our key contribution is the development of a multi-stage, multi-modal knowledge transfer framework utilizing fine-tuned latent diffusion models (LDMs) to mitigate the distribution gap between the source and target environments. Extensive experiments across diverse aerial imagery domains show consistent performance improvements in AP50 over supervised learning on source domain data, weakly supervised adaptation methods, unsupervised domain adaptation methods, and open-set object detectors by 4-23%, 6-10%, 7-40%, and more than 50%, respectively. Furthermore, we introduce two newly annotated aerial datasets from New Zealand and Utah to support further research in this field. Project page is available at: https://humansensinglab.github.io/AGenDA
Problem

Research questions and friction points this paper is trying to address.

Improving vehicle detection in unseen aerial domains
Addressing domain shifts in geographic regions
Enhancing detector training with generative AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI synthesizes aerial images and labels
Multi-stage multi-modal knowledge transfer framework
Fine-tuned latent diffusion models reduce domain gaps
🔎 Similar Papers
No similar papers found.