Embodied Domain Adaptation for Object Detection

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Open-vocabulary object detection (OVOD) for mobile robots degrades significantly in dynamic indoor environments—e.g., under lighting variations, furniture rearrangements, or novel object appearances—due to domain shift and the absence of source-domain data during deployment. Method: We propose the first source-free domain adaptation (SFDA) framework tailored for embodied intelligence, which jointly integrates temporal clustering–guided pseudo-label refinement, multi-scale threshold fusion, and contrastive learning–enhanced Mean Teacher supervision—all without accessing source-domain data. Contribution/Results: We further introduce EDAOD, the first benchmark capturing temporally evolving domain shifts across illumination, layout, and object composition. Experiments demonstrate that our method achieves a 12.3% relative improvement in mAP over baseline methods on zero-shot detection tasks, substantially enhancing both rapid adaptation and robustness in real-world home and lab settings.

Technology Category

Application Category

📝 Abstract

Mobile robots rely on object detectors for perception and object localization in indoor environments. However, standard closed-set methods struggle to handle the diverse objects and dynamic conditions encountered in real homes and labs. Open-vocabulary object detection (OVOD), driven by Vision Language Models (VLMs), extends beyond fixed labels but still struggles with domain shifts in indoor environments. We introduce a Source-Free Domain Adaptation (SFDA) approach that adapts a pre-trained model without accessing source data. We refine pseudo labels via temporal clustering, employ multi-scale threshold fusion, and apply a Mean Teacher framework with contrastive learning. Our Embodied Domain Adaptation for Object Detection (EDAOD) benchmark evaluates adaptation under sequential changes in lighting, layout, and object diversity. Our experiments show significant gains in zero-shot detection performance and flexible adaptation to dynamic indoor conditions.

Problem

Research questions and friction points this paper is trying to address.

Adapting object detectors to dynamic indoor environments

Overcoming domain shifts in open-vocabulary object detection

Enhancing zero-shot detection in changing lighting and layouts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Source-Free Domain Adaptation without source data

Temporal clustering refines pseudo labels

Mean Teacher framework with contrastive learning

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

PhD GenAI Research Scientist Intern

Databricks

SF Bay Area Hourly Rate$54—$60 USD

San Francisco, CA, USA

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)