CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work presents the first systematic evaluation of vision-language models (LVLMs) on map understanding tasks, revealing critical deficiencies in map symbol recognition, embedded information extraction, scale interpretation, and path reasoning. To address this gap, we introduce MapBench—the first multi-level benchmark specifically designed for geographic cartographic understanding—comprising over 2,000 high-quality, human-annotated samples spanning open-ended and multiple-choice question formats, integrated with OCR robustness analysis and cognitively layered task design. Extensive experiments demonstrate that state-of-the-art LVLMs exhibit severe limitations in semantic parsing, spatial reasoning, and resilience to textual interference. Our contribution includes a scalable, open-source evaluation framework, along with publicly released data and code, establishing both theoretical foundations and practical tools to enhance LVLM reliability in geospatial intelligence applications such as navigation and urban planning.

Technology Category

Application Category

📝 Abstract
The rise of Visual-Language Models (LVLMs) has unlocked new possibilities for seamlessly integrating visual and textual information. However, their ability to interpret cartographic maps remains largely unexplored. In this paper, we introduce CartoMapQA, a benchmark specifically designed to evaluate LVLMs'understanding of cartographic maps through question-answering tasks. The dataset includes over 2000 samples, each composed of a cartographic map, a question (with open-ended or multiple-choice answers), and a ground-truth answer. These tasks span key low-, mid- and high-level map interpretation skills, including symbol recognition, embedded information extraction, scale interpretation, and route-based reasoning. Our evaluation of both open-source and proprietary LVLMs reveals persistent challenges: models frequently struggle with map-specific semantics, exhibit limited geospatial reasoning, and are prone to Optical Character Recognition (OCR)-related errors. By isolating these weaknesses, CartoMapQA offers a valuable tool for guiding future improvements in LVLM architectures. Ultimately, it supports the development of models better equipped for real-world applications that depend on robust and reliable map understanding, such as navigation, geographic search, and urban planning. Our source code and data are openly available to the research community at: https://github.com/ungquanghuy-kddi/CartoMapQA.git
Problem

Research questions and friction points this paper is trying to address.

Evaluates vision-language models on cartographic map interpretation
Assesses map-specific semantics and geospatial reasoning capabilities
Identifies OCR-related errors in map understanding tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces CartoMapQA benchmark for evaluating map understanding
Tests models on symbol recognition and geospatial reasoning tasks
Identifies OCR errors and map semantics as key challenges
🔎 Similar Papers
No similar papers found.
H
H. Ung
KDDI Research, Inc., Fujimino, Japan
G
Guillaume Habault
KDDI Research, Inc., Fujimino, Japan
Y
Yasutaka Nishimura
KDDI Research, Inc., Fujimino, Japan
Hao Niu
Hao Niu
KDDI Research, Inc.
Machine Learning
Roberto Legaspi
Roberto Legaspi
KDDI Research, Inc.
Human behavior computing and AI
T
Tomoki Oya
KDDI Research, Inc., Fujimino, Japan
R
Ryo Kojima
KDDI Research, Inc., Fujimino, Japan
M
Masato Taya
KDDI Research, Inc., Fujimino, Japan
C
C. Ono
KDDI Research, Inc., Fujimino, Japan
A
A. Minamikawa
KDDI Research, Inc., Fujimino, Japan
Y
Yan Liu
University of Southern California, Los Angeles, USA