MapTrace: Scalable Data Generation for Route Tracing on Maps

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal large language models (MLLMs) exhibit limited performance on fine-grained spatial understanding tasks—such as map-based path tracing—due to scarce and costly pixel-level ground-truth annotations. Method: This paper introduces the first synthetic data generation paradigm specifically designed for path tracing, leveraging controllable map rendering and automated pixel-level path parsing to construct a large-scale, pixel-accurate synthetic training set comprising 23K samples. We perform supervised fine-tuning of MLLMs on this dataset and propose MapBench, a dedicated evaluation framework for systematic assessment. Contribution/Results: Experimental results demonstrate that our approach improves path tracing success rate by up to 6.4 points and significantly reduces Normalized Dynamic Time Warping (NDTW) error. To our knowledge, this is the first work to empirically validate that synthetic supervision can effectively enhance MLLMs’ fine-grained spatial reasoning capabilities.

Technology Category

Application Category

📝 Abstract
While Multimodal Large Language Models have achieved human-like performance on many visual and textual reasoning tasks, their proficiency in fine-grained spatial understanding, such as route tracing on maps remains limited. Unlike humans, who can quickly learn to parse and navigate maps, current models often fail to respect fundamental path constraints, in part due to the prohibitive cost and difficulty of collecting large-scale, pixel-accurate path annotations. To address this, we introduce a scalable synthetic data generation pipeline that leverages synthetic map images and pixel-level parsing to automatically produce precise annotations for this challenging task. Using this pipeline, we construct a fine-tuning dataset of 23k path samples across 4k maps, enabling models to acquire more human-like spatial capabilities. Using this dataset, we fine-tune both open-source and proprietary MLLMs. Results on MapBench show that finetuning substantially improves robustness, raising success rates by up to 6.4 points, while also reducing path-tracing error (NDTW). These gains highlight that fine-grained spatial reasoning, absent in pretrained models, can be explicitly taught with synthetic supervision.
Problem

Research questions and friction points this paper is trying to address.

Generates synthetic data for map route tracing
Improves spatial reasoning in multimodal language models
Enhances path constraint adherence with fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic data generation pipeline for scalable annotations
Pixel-level parsing on synthetic maps for precise path labeling
Fine-tuning MLLMs with synthetic supervision to improve spatial reasoning
🔎 Similar Papers
No similar papers found.