MapIQ: Benchmarking Multimodal Large Language Models for Map Question Answering

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Current multimodal large language models (MLLMs) exhibit strong reliance on choropleth maps for Map-VQA tasks, with no systematic evaluation across diverse map types—such as statistical maps and proportional symbol maps—or thematic domains like housing and crime. To address this gap, we introduce MapIQ, the first MLLM-oriented benchmark for map question answering, covering three map types, six thematic domains, and six visual analytical tasks, augmented by controlled experiments on cartographic design variables. Our contributions include: (i) the first comprehensive coverage of thematic map diversity; (ii) establishment of human performance baselines and rigorous robustness evaluation; and (iii) empirical revelation of MLLMs’ heavy dependence on geographic prior knowledge and high sensitivity to design elements—including color schemes and legends. Experiments demonstrate that state-of-the-art MLLMs significantly underperform humans on complex map understanding, underscoring the critical need for domain-specific evaluation frameworks.

Technology Category

Application Category

📝 Abstract

Recent advancements in multimodal large language models (MLLMs) have driven researchers to explore how well these models read data visualizations, e.g., bar charts, scatter plots. More recently, attention has shifted to visual question answering with maps (Map-VQA). However, Map-VQA research has primarily focused on choropleth maps, which cover only a limited range of thematic categories and visual analytical tasks. To address these gaps, we introduce MapIQ, a benchmark dataset comprising 14,706 question-answer pairs across three map types: choropleth maps, cartograms, and proportional symbol maps spanning topics from six distinct themes (e.g., housing, crime). We evaluate multiple MLLMs using six visual analytical tasks, comparing their performance against one another and a human baseline. An additional experiment examining the impact of map design changes (e.g., altered color schemes, modified legend designs, and removal of map elements) provides insights into the robustness and sensitivity of MLLMs, their reliance on internal geographic knowledge, and potential avenues for improving Map-VQA performance.

Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs' performance on diverse map question answering tasks

Addressing limited thematic and task coverage in current Map-VQA research

Assessing impact of map design changes on MLLM robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing MapIQ benchmark dataset for Map-VQA

Evaluating MLLMs across six visual analytical tasks

Examining impact of map design changes on MLLMs

🔎 Similar Papers

No similar papers found.