🤖 AI Summary
This work investigates the capability of Large Vision-Language Models (LVLMs) to interpret pixel-level outdoor maps and generate natural-language navigation instructions. To this end, we introduce MapBench—the first outdoor navigation benchmark explicitly designed for human-readable maps—comprising 100 real-world maps and over 1,600 path-following tasks. We propose the Map Spatial Scene Graph (MSSG) as a cross-modal alignment index for fine-grained evaluation, and design a cognitively decomposed Chain-of-Thought (CoT) reasoning framework to systematically expose fundamental limitations of LVLMs in spatial reasoning and structured decision-making. Through zero-shot prompting, MSSG-guided inference, and multi-granularity evaluation, we comprehensively assess leading LVLMs, revealing an average task accuracy below 35%, confirming MapBench’s high difficulty. The benchmark dataset, evaluation code, and implementation are publicly released.
📝 Abstract
In this paper, we introduce MapBench-the first dataset specifically designed for human-readable, pixel-based map-based outdoor navigation, curated from complex path finding scenarios. MapBench comprises over 1600 pixel space map path finding problems from 100 diverse maps. In MapBench, LVLMs generate language-based navigation instructions given a map image and a query with beginning and end landmarks. For each map, MapBench provides Map Space Scene Graph (MSSG) as an indexing data structure to convert between natural language and evaluate LVLM-generated results. We demonstrate that MapBench significantly challenges state-of-the-art LVLMs both zero-shot prompting and a Chain-of-Thought (CoT) augmented reasoning framework that decomposes map navigation into sequential cognitive processes. Our evaluation of both open-source and closed-source LVLMs underscores the substantial difficulty posed by MapBench, revealing critical limitations in their spatial reasoning and structured decision-making capabilities. We release all the code and dataset in https://github.com/taco-group/MapBench.