Test-Time Adaptive Object Detection with Foundation Model

๐Ÿ“… 2025-10-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing test-time adaptation (TTA) methods for object detection rely on source-domain statistics and assume a closed-set category space, rendering them inadequate for open-world scenarios. To address this, we propose the first source-data-free, open-vocabulary TTA framework grounded in vision-language foundation models. Our approach introduces a multimodal prompt-based Mean-Teacher architecture that jointly leverages text and visual prompts for parameter-efficient fine-tuning. It incorporates an Instance Dynamic Memory (IDM) module to maintain category-agnostic instance representations and integrates memory-enhanced pseudo-labeling with controlled hallucination to improve label quality under domain shift. Evaluated on cross-corruption and cross-dataset benchmarks, our method significantly outperforms state-of-the-art TTA approaches. Notably, it enables continuous, cross-domain and cross-category adaptation for arbitrary target domainsโ€”marking the first such capability in TTA for object detection.

Technology Category

Application Category

๐Ÿ“ Abstract
In recent years, test-time adaptive object detection has attracted increasing attention due to its unique advantages in online domain adaptation, which aligns more closely with real-world application scenarios. However, existing approaches heavily rely on source-derived statistical characteristics while making the strong assumption that the source and target domains share an identical category space. In this paper, we propose the first foundation model-powered test-time adaptive object detection method that eliminates the need for source data entirely and overcomes traditional closed-set limitations. Specifically, we design a Multi-modal Prompt-based Mean-Teacher framework for vision-language detector-driven test-time adaptation, which incorporates text and visual prompt tuning to adapt both language and vision representation spaces on the test data in a parameter-efficient manner. Correspondingly, we propose a Test-time Warm-start strategy tailored for the visual prompts to effectively preserve the representation capability of the vision branch. Furthermore, to guarantee high-quality pseudo-labels in every test batch, we maintain an Instance Dynamic Memory (IDM) module that stores high-quality pseudo-labels from previous test samples, and propose two novel strategies-Memory Enhancement and Memory Hallucination-to leverage IDM's high-quality instances for enhancing original predictions and hallucinating images without available pseudo-labels, respectively. Extensive experiments on cross-corruption and cross-dataset benchmarks demonstrate that our method consistently outperforms previous state-of-the-art methods, and can adapt to arbitrary cross-domain and cross-category target data. Code is available at https://github.com/gaoyingjay/ttaod_foundation.
Problem

Research questions and friction points this paper is trying to address.

Develops test-time adaptive object detection without source data dependency
Overcomes closed-set limitations using vision-language foundation models
Enables adaptation to arbitrary cross-domain and cross-category target data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal Prompt-based Mean-Teacher framework for adaptation
Test-time Warm-start strategy preserves vision representation capability
Instance Dynamic Memory module enhances and hallucinates pseudo-labels
๐Ÿ”Ž Similar Papers
No similar papers found.
Yingjie Gao
Yingjie Gao
Beihang University
Object Detection
Y
Yanan Zhang
School of Computer Science and Information Engineering, Hefei University of Technology
Z
Zhi Cai
State Key Laboratory of Complex and Critical Software Environment, Beihang University; School of Computer Science and Engineering, Beihang University
D
Di Huang
State Key Laboratory of Complex and Critical Software Environment, Beihang University; School of Computer Science and Engineering, Beihang University