GiPL: Generative augmented iterative Pseudo-Labeling for Cross-Domain Few-Shot Object Detection

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of underutilized support sets and severe overfitting caused by extremely limited labeled instances in cross-domain few-shot object detection. To this end, the authors propose GiPL, a dual-branch training framework: one branch enhances the support set through iterative pseudo-label self-training that fuses zero-shot inference–generated pseudo-annotations with ground-truth labels, while the other branch leverages large vision-language models to synthesize domain-aligned images with multi-object annotations for data augmentation. This study is the first to integrate iterative pseudo-label self-training with generative data augmentation based on vision-language models, substantially improving model generalization and robustness. Extensive experiments on RUOD, CARPK, and CarDD benchmarks under 1/5/10-shot settings demonstrate consistent and significant performance gains over existing state-of-the-art methods.
📝 Abstract
Vision-language foundation models have shown promising zero-shot generalization for Cross-Domain Few-Shot Object Detection (CD-FSOD). However, they face two critical challenges in fine-tuning: insufficient support set utilization due to sparse single-instance annotations, and severe overfitting under extremely limited target-domain samples. To address these issues, this paper proposes GiPL, an efficient two-branch training framework.In the first branch, we design an iterative pseudo-label self-training paradigm, which performs zero-shot inference on the support set to generate reliable pseudo-annotations, fuses them with ground-truth labels, and iteratively optimizes the model to fully exploit support set data. In the second branch, we introduce generative data augmentation pipeline using large vision-language models, which synthesizes domain-aligned, multi-object annotated images to enrich training samples and suppress overfitting. Extensive experiments on three challenging CD-FSOD datasets (RUOD, CARPK, CarDD) under 1/5/10-shot settings demonstrate that GiPL consistently outperforms state-of-the-art methods with significant performance gains.Code is available at \href{https://github.com/z-yaz/CDiscover}{CDiscover}.
Problem

Research questions and friction points this paper is trying to address.

Cross-Domain Few-Shot Object Detection
support set utilization
overfitting
pseudo-labeling
data scarcity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Augmentation
Iterative Pseudo-Labeling
Cross-Domain Few-Shot Object Detection
Vision-Language Models
Self-Training
🔎 Similar Papers
No similar papers found.