LangMap: A Hierarchical Benchmark for Open-Vocabulary Goal Navigation

πŸ“… 2026-02-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the insufficient semantic alignment between language and environment, as well as the lack of multi-granularity evaluation benchmarks in open-vocabulary object navigation, by introducing HieraNavβ€”a task that requires agents to navigate to targets spanning four semantic levels (scene, room, region, and instance) in real 3D indoor environments based on natural language instructions. To support this, we present LangMap, a large-scale benchmark dataset featuring human-verified, high-quality discriminative language descriptions that enable open-vocabulary navigation across all four granularities for the first time. Experiments show that LangMap achieves a 23.8% higher discriminative accuracy than GOAT-Bench while using 75% fewer words. Evaluation further reveals that contextual and memory mechanisms improve success rates, yet challenges persist in long-tail categories, small or distant objects, and multi-target scenarios.

Technology Category

Application Category

πŸ“ Abstract
The relationships between objects and language are fundamental to meaningful communication between humans and AI, and to practically useful embodied intelligence. We introduce HieraNav, a multi-granularity, open-vocabulary goal navigation task where agents interpret natural language instructions to reach targets at four semantic levels: scene, room, region, and instance. To this end, we present Language as a Map (LangMap), a large-scale benchmark built on real-world 3D indoor scans with comprehensive human-verified annotations and tasks spanning these levels. LangMap provides region labels, discriminative region descriptions, discriminative instance descriptions covering 414 object categories, and over 18K navigation tasks. Each target features both concise and detailed descriptions, enabling evaluation across different instruction styles. LangMap achieves superior annotation quality, outperforming GOAT-Bench by 23.8% in discriminative accuracy using four times fewer words. Comprehensive evaluations of zero-shot and supervised models on LangMap reveal that richer context and memory improve success, while long-tailed, small, context-dependent, and distant goals, as well as multi-goal completion, remain challenging. HieraNav and LangMap establish a rigorous testbed for advancing language-driven embodied navigation. Project: https://bo-miao.github.io/LangMap
Problem

Research questions and friction points this paper is trying to address.

open-vocabulary goal navigation
embodied intelligence
natural language instructions
semantic levels
3D indoor navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

open-vocabulary navigation
hierarchical semantic grounding
language-driven embodied AI
discriminative language description
3D indoor benchmark
πŸ”Ž Similar Papers
No similar papers found.