Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments

📅 2024-10-23
🏛️ Neural Information Processing Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing visual navigation research is constrained to predefined environmental objects and struggles with user-specified personalized instances (e.g., “my red water cup”). Method: We introduce Personalized Instance Navigation (PIN), a novel task requiring an embodied agent to locate and navigate to a specific instance among multiple visually similar ones in realistic indoor scenes, guided by fine-grained cross-modal instructions (image + text). To support this, we propose PInNED—the first PIN-specific dataset—comprising photorealistic indoor scenes and augmentable 3D user-object models. Our approach integrates multi-view reference image matching, textual description parsing, and an embodied navigation framework, enabling both modular and end-to-end evaluation. Contribution/Results: Extensive experiments on PInNED reveal fundamental bottlenecks in existing methods regarding instance discrimination and cross-scene generalization. We establish the first benchmark for PIN, validating its inherent difficulty and the necessity of new evaluation dimensions for personalized, instance-level navigation.

Technology Category

Application Category

📝 Abstract
In the last years, the research interest in visual navigation towards objects in indoor environments has grown significantly. This growth can be attributed to the recent availability of large navigation datasets in photo-realistic simulated environments, like Gibson and Matterport3D. However, the navigation tasks supported by these datasets are often restricted to the objects present in the environment at acquisition time. Also, they fail to account for the realistic scenario in which the target object is a user-specific instance that can be easily confused with similar objects and may be found in multiple locations within the environment. To address these limitations, we propose a new task denominated Personalized Instance-based Navigation (PIN), in which an embodied agent is tasked with locating and reaching a specific personal object by distinguishing it among multiple instances of the same category. The task is accompanied by PInNED, a dedicated new dataset composed of photo-realistic scenes augmented with additional 3D objects. In each episode, the target object is presented to the agent using two modalities: a set of visual reference images on a neutral background and manually annotated textual descriptions. Through comprehensive evaluations and analyses, we showcase the challenges of the PIN task as well as the performance and shortcomings of currently available methods designed for object-driven navigation, considering modular and end-to-end agents.
Problem

Research questions and friction points this paper is trying to address.

Personalized navigation to user-specific objects
Distinguishing target among similar instances
Realistic environments with multiple object locations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Personalized Instance-based Navigation task
PInNED dataset with 3D objects
Visual and textual object presentation
🔎 Similar Papers
No similar papers found.