Beyond Boundaries: Leveraging Vision Foundation Models for Source-Free Object Detection

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

For source-free object detection (SFOD)—where source-domain data is unavailable—existing methods rely solely on internal knowledge of the pre-trained source model, leading to biased pseudo-labels and limited generalization and discriminability. This paper proposes the first SFOD framework leveraging vision foundation models (VFMs) as external knowledge sources. It introduces three key innovations: (1) patch-weighted global feature alignment, (2) prototype-guided instance-level feature alignment, and (3) a dual-source prediction with entropy-aware fusion mechanism for pseudo-label refinement. By jointly optimizing feature alignment and high-quality pseudo-label generation, the method eliminates dependence on the source model’s internal representations. Evaluated on six standard cross-domain benchmarks, it achieves state-of-the-art performance, significantly improving transferability and class discrimination. Results empirically validate the critical role of VFMs as effective external knowledge in source-free detection.

Technology Category

Application Category

📝 Abstract

Source-Free Object Detection (SFOD) aims to adapt a source-pretrained object detector to a target domain without access to source data. However, existing SFOD methods predominantly rely on internal knowledge from the source model, which limits their capacity to generalize across domains and often results in biased pseudo-labels, thereby hindering both transferability and discriminability. In contrast, Vision Foundation Models (VFMs), pretrained on massive and diverse data, exhibit strong perception capabilities and broad generalization, yet their potential remains largely untapped in the SFOD setting. In this paper, we propose a novel SFOD framework that leverages VFMs as external knowledge sources to jointly enhance feature alignment and label quality. Specifically, we design three VFM-based modules: (1) Patch-weighted Global Feature Alignment (PGFA) distills global features from VFMs using patch-similarity-based weighting to enhance global feature transferability; (2) Prototype-based Instance Feature Alignment (PIFA) performs instance-level contrastive learning guided by momentum-updated VFM prototypes; and (3) Dual-source Enhanced Pseudo-label Fusion (DEPF) fuses predictions from detection VFMs and teacher models via an entropy-aware strategy to yield more reliable supervision. Extensive experiments on six benchmarks demonstrate that our method achieves state-of-the-art SFOD performance, validating the effectiveness of integrating VFMs to simultaneously improve transferability and discriminability.

Problem

Research questions and friction points this paper is trying to address.

Adapting object detectors across domains without source data access

Overcoming biased pseudo-labels and limited generalization in SFOD

Integrating Vision Foundation Models to enhance feature alignment and label quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Vision Foundation Models as external knowledge sources

Aligns features via patch-weighted and prototype-based modules

Fuses pseudo-labels from VFMs and teacher models

🔎 Similar Papers

No similar papers found.