MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

📅 2024-12-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing benchmarks inadequately evaluate foundational models’ geospatial reasoning capabilities due to limited scope, lack of multimodal tasks, and insufficient coverage of core geographic competencies. Method: We introduce MapEval—the first multimodal benchmark for map understanding—comprising 700 multiple-choice questions across 180 cities in 54 countries, spanning three task modalities: textual reasoning, API-based querying, and visual map interpretation. It assesses spatial relation understanding, map information extraction, and route planning. We further propose a heterogeneous geographic context fusion mechanism and a compositional spatial reasoning evaluation paradigm, integrating real-world map toolchains with multi-source geodata (POIs, distances, reviews, imagery). Contribution/Results: Comprehensive evaluation of 28 state-of-the-art models reveals Claude-3.5-Sonnet as the top-performing model overall; however, all models underperform humans by over 20% on average, with sub-40% accuracy on map image interpretation—highlighting critical bottlenecks in geospatial intelligence.

Technology Category

Application Category

📝 Abstract
Recent advancements in foundation models have enhanced AI systems' capabilities in autonomous tool usage and reasoning. However, their ability in location or map-based reasoning - which improves daily life by optimizing navigation, facilitating resource discovery, and streamlining logistics - has not been systematically studied. To bridge this gap, we introduce MapEval, a benchmark designed to assess diverse and complex map-based user queries with geo-spatial reasoning. MapEval features three task types (textual, API-based, and visual) that require collecting world information via map tools, processing heterogeneous geo-spatial contexts (e.g., named entities, travel distances, user reviews or ratings, images), and compositional reasoning, which all state-of-the-art foundation models find challenging. Comprising 700 unique multiple-choice questions about locations across 180 cities and 54 countries, MapEval evaluates foundation models' ability to handle spatial relationships, map infographics, travel planning, and navigation challenges. Using MapEval, we conducted a comprehensive evaluation of 28 prominent foundation models. While no single model excelled across all tasks, Claude-3.5-Sonnet, GPT-4o, and Gemini-1.5-Pro achieved competitive performance overall. However, substantial performance gaps emerged, particularly in MapEval, where agents with Claude-3.5-Sonnet outperformed GPT-4o and Gemini-1.5-Pro by 16% and 21%, respectively, and the gaps became even more amplified when compared to open-source LLMs. Our detailed analyses provide insights into the strengths and weaknesses of current models, though all models still fall short of human performance by more than 20% on average, struggling with complex map images and rigorous geo-spatial reasoning. This gap highlights MapEval's critical role in advancing general-purpose foundation models with stronger geo-spatial understanding.
Problem

Research questions and friction points this paper is trying to address.

Geospatial Reasoning
AI Model Evaluation
Map Understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

MapEval
Geospatial Reasoning
AI Assessment
🔎 Similar Papers
No similar papers found.