LangNavBench: Evaluation of Natural Language Understanding in Semantic Navigation

📅 2025-07-09

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Existing semantic navigation benchmarks lack fine-grained evaluation of language understanding capabilities—particularly the agent’s ability to accurately ground linguistic elements (e.g., attributes, spatial relations, and category hierarchies) across varying levels of descriptive granularity. To address this gap, we introduce LangNavBench, the first language-centric, open-set evaluation benchmark for semantic navigation. We further propose Multi-Layered Fine-grained Mapping (MLFM), a queryable, hierarchical semantic mapping framework that explicitly models fine-grained language semantics by integrating large vision-language models, semantic mapping, and natural language grounding techniques. Experiments demonstrate that MLFM significantly outperforms state-of-the-art map-based navigation methods on LangNavBench, achieving substantial improvements in both navigation accuracy—especially for small-object localization—and consistency between navigation behavior and linguistic intent.

Technology Category

Application Category

📝 Abstract

Recent progress in large vision-language models has driven improvements in language-based semantic navigation, where an embodied agent must reach a target object described in natural language. Despite these advances, we still lack a clear, language-focused benchmark for testing how well such agents ground the words in their instructions. We address this gap with LangNav, an open-set dataset specifically created to test an agent's ability to locate objects described at different levels of detail, from broad category names to fine attributes and object-object relations. Every description in LangNav was manually checked, yielding a lower error rate than existing lifelong- and semantic-navigation datasets. On top of LangNav we build LangNavBench, a benchmark that measures how well current semantic-navigation methods understand and act on these descriptions while moving toward their targets. LangNavBench allows us to systematically compare models on their handling of attributes, spatial and relational cues, and category hierarchies, offering the first thorough, language-centric evaluation of embodied navigation systems. We also present Multi-Layered Feature Map (MLFM), a method that builds a queryable multi-layered semantic map, particularly effective when dealing with small objects or instructions involving spatial relations. MLFM outperforms state-of-the-art mapping-based navigation baselines on the LangNav dataset.

Problem

Research questions and friction points this paper is trying to address.

Evaluates natural language understanding in semantic navigation tasks

Tests agent's ability to locate objects with varied description details

Compares models on handling attributes, spatial cues, and hierarchies

Innovation

Methods, ideas, or system contributions that make the work stand out.

LangNavBench evaluates language understanding in navigation

Multi-Layered Feature Map improves semantic mapping

Open-set dataset tests object location accuracy

🔎 Similar Papers

No similar papers found.