One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation

📅 2024-09-18
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses zero-shot multi-object navigation by proposing a pretraining-free, reusable real-time open-vocabulary semantic mapping method. Existing approaches rely on fixed vocabularies or task-specific pretraining, limiting generalization to unseen objects and hindering real-world deployment. Method: We introduce (1) the first zero-shot multi-object navigation benchmark; (2) a probabilistic semantic mapping update mechanism that explicitly models semantic uncertainty to guide active exploration; and (3) a lightweight integration of open-vocabulary vision models (e.g., CLIP) with real-time semantic mapping, enabling end-to-end deployment on edge hardware (Jetson Orin AGX). The framework supports both single- and multi-object collaborative search and dynamically leverages historical observations to accelerate localization of novel targets. Results: Extensive simulation and real-robot experiments demonstrate significant improvements over state-of-the-art methods, validating effectiveness in complex environments and feasibility of edge deployment.

Technology Category

Application Category

📝 Abstract
The capability to efficiently search for objects in complex environments is fundamental for many real-world robot applications. Recent advances in open-vocabulary vision models have resulted in semantically-informed object navigation methods that allow a robot to search for an arbitrary object without prior training. However, these zero-shot methods have so far treated the environment as unknown for each consecutive query. In this paper we introduce a new benchmark for zero-shot multi-object navigation, allowing the robot to leverage information gathered from previous searches to more efficiently find new objects. To address this problem we build a reusable open-vocabulary feature map tailored for real-time object search. We further propose a probabilistic-semantic map update that mitigates common sources of errors in semantic feature extraction and leverage this semantic uncertainty for informed multi-object exploration. We evaluate our method on a set of object navigation tasks in both simulation as well as with a real robot, running in real-time on a Jetson Orin AGX. We demonstrate that it outperforms existing state-of-the-art approaches both on single and multi-object navigation tasks. Additional videos, code and the multi-object navigation benchmark will be available on https://finnbsch.github.io/OneMap.
Problem

Research questions and friction points this paper is trying to address.

Enables real-time multi-object navigation using open-vocabulary mapping.
Leverages previous search data to improve object-finding efficiency.
Addresses semantic uncertainty for robust multi-object exploration.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reusable open-vocabulary feature map for real-time search
Probabilistic-semantic map update to reduce errors
Leverages semantic uncertainty for efficient multi-object exploration
F
Finn Lukas Busch
Division of Robotics, Perception, and Learning, KTH Royal Institute of Technology, Sweden
T
Timon Homberger
Division of Robotics, Perception, and Learning, KTH Royal Institute of Technology, Sweden
J
Jes'us Ortega-Peimbert
Division of Robotics, Perception, and Learning, KTH Royal Institute of Technology, Sweden
Q
Quantao Yang
Division of Robotics, Perception, and Learning, KTH Royal Institute of Technology, Sweden
Olov Andersson
Olov Andersson
Assistant Professor at KTH Royal Institute of Technology. Previously: ASL@ETH Zurich
Robot LearningAutonomous RobotsMotion PlanningMappingNavigation