High-Quality Proposal Encoding and Cascade Denoising for Imaginary Supervised Object Detection

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three key bottlenecks in Imaginary Supervised Object Detection (ISOD) under sim-to-real transfer: (1) low-quality synthetic data—caused by simplistic prompts, blurry images, and weak supervision; (2) slow convergence and overfitting of DETR-style detectors due to random query initialization; and (3) heightened sensitivity to pseudo-label noise induced by uniform denoising. To tackle these, we propose a high-quality proposal-guided query initialization and cascaded denoising framework. Specifically, we generate high-fidelity synthetic data using LLaMA-3, Flux, and Grounding DINO; initialize DETR queries via SAM-proposed regions and RoI-pooled feature encoding; and introduce a hierarchical, IoU-thresholded denoising training strategy. Trained solely on FluxVOC for 12 epochs, our method achieves 61.04% mAP@0.5 on PASCAL VOC 2007—significantly outperforming strong baselines and marking the first efficient leap from weakly supervised synthetic training to fully supervised real-world performance.

Technology Category

Application Category

📝 Abstract
Object detection models demand large-scale annotated datasets, which are costly and labor-intensive to create. This motivated Imaginary Supervised Object Detection (ISOD), where models train on synthetic images and test on real images. However, existing methods face three limitations: (1) synthetic datasets suffer from simplistic prompts, poor image quality, and weak supervision; (2) DETR-based detectors, due to their random query initialization, struggle with slow convergence and overfitting to synthetic patterns, hindering real-world generalization; (3) uniform denoising pressure promotes model overfitting to pseudo-label noise. We propose Cascade HQP-DETR to address these limitations. First, we introduce a high-quality data pipeline using LLaMA-3, Flux, and Grounding DINO to generate the FluxVOC and FluxCOCO datasets, advancing ISOD from weak to full supervision. Second, our High-Quality Proposal guided query encoding initializes object queries with image-specific priors from SAM-generated proposals and RoI-pooled features, accelerating convergence while steering the model to learn transferable features instead of overfitting to synthetic patterns. Third, our cascade denoising algorithm dynamically adjusts training weights through progressively increasing IoU thresholds across decoder layers, guiding the model to learn robust boundaries from reliable visual cues rather than overfitting to noisy labels. Trained for just 12 epochs solely on FluxVOC, Cascade HQP-DETR achieves a SOTA 61.04% mAP@0.5 on PASCAL VOC 2007, outperforming strong baselines, with its competitive real-data performance confirming the architecture's universal applicability.
Problem

Research questions and friction points this paper is trying to address.

Overcoming synthetic data limitations in imaginary supervised object detection
Addressing slow convergence and overfitting in DETR-based detectors with synthetic data
Solving model overfitting to pseudo-label noise through dynamic denoising
Innovation

Methods, ideas, or system contributions that make the work stand out.

High-quality data pipeline generates fully supervised synthetic datasets
Proposal-guided query encoding with SAM priors accelerates convergence
Cascade denoising algorithm dynamically adjusts training weights progressively
🔎 Similar Papers
No similar papers found.
Z
Zhiyuan Chen
Institute of Cyberspace Security, Harbin Institute of Technology, Shenzhen, University Town of Shenzhen, Nanshan District, Shenzhen, 518055, Guangdong, China
Y
Yuelin Guo
Institute of Cyberspace Security, Harbin Institute of Technology, Shenzhen, University Town of Shenzhen, Nanshan District, Shenzhen, 518055, Guangdong, China
Z
Zitong Huang
Center on Machine Learning Research, Harbin Institute of Technology, No. 92 West Dazhi Street, Nangang District, Harbin, 150001, Heilongjiang, China
H
Haoyu He
Faculty of Information Technology, Monash University, Wellington Road, Clayton, 3800, Victoria, Australia
R
Renhao Lu
Department of New Networks, Peng Cheng Laboratory, Xili Lake International Science and Education City, Nanshan District, Shenzhen, 518066, Guangdong, China
Weizhe Zhang
Weizhe Zhang
Professor of Peng Cheng Laboratory & Harbin Institute of Technology
Parallel and Distributed SystemCloud ComputingRealtime SchedulingComputer Network