MapSAM2: Adapting SAM2 for Automatic Segmentation of Historical Map Images and Time Series

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of automatic segmentation of historical map images and associated time-series data. We propose a unified spatiotemporal modeling framework that treats single or multi-temporal maps as pseudo-video inputs and performs few-shot fine-tuning of the SAM2 vision foundation model. Our contributions include: (1) a memory-augmented spatiotemporal attention mechanism enabling cross-frame feature alignment and instance association; (2) the Siegfried Building Time-Series dataset, along with a pseudo-time-series generation strategy using single-year maps to reduce annotation cost; and (3) a geometrically aware sliding-window tiling scheme for robust inference. Experiments demonstrate substantial improvements in both polygonal object segmentation accuracy (e.g., buildings) and inter-temporal instance linking under limited supervision, outperforming state-of-the-art methods. The code and dataset will be publicly released.

Technology Category

Application Category

📝 Abstract
Historical maps are unique and valuable archives that document geographic features across different time periods. However, automated analysis of historical map images remains a significant challenge due to their wide stylistic variability and the scarcity of annotated training data. Constructing linked spatio-temporal datasets from historical map time series is even more time-consuming and labor-intensive, as it requires synthesizing information from multiple maps. Such datasets are essential for applications such as dating buildings, analyzing the development of road networks and settlements, studying environmental changes etc. We present MapSAM2, a unified framework for automatically segmenting both historical map images and time series. Built on a visual foundation model, MapSAM2 adapts to diverse segmentation tasks with few-shot fine-tuning. Our key innovation is to treat both historical map images and time series as videos. For images, we process a set of tiles as a video, enabling the memory attention mechanism to incorporate contextual cues from similar tiles, leading to improved geometric accuracy, particularly for areal features. For time series, we introduce the annotated Siegfried Building Time Series Dataset and, to reduce annotation costs, propose generating pseudo time series from single-year maps by simulating common temporal transformations. Experimental results show that MapSAM2 learns temporal associations effectively and can accurately segment and link buildings in time series under limited supervision or using pseudo videos. We will release both our dataset and code to support future research.
Problem

Research questions and friction points this paper is trying to address.

Automating historical map segmentation despite stylistic variability and data scarcity
Constructing linked spatio-temporal datasets from historical map time series efficiently
Reducing annotation costs for temporal analysis of geographic features in maps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts visual foundation model for map segmentation
Treats map images and time series as videos
Generates pseudo time series from single-year maps
🔎 Similar Papers
No similar papers found.
Xue Xia
Xue Xia
Pinterest
Randall Balestriero
Randall Balestriero
AI Researcher
Self Supervised LearningUseful TheorySplines
T
Tao Zhang
Wuhan University, Wuhan, China
Y
Yixin Zhou
ETH Zurich, Zurich, Switzerland
Andrew Ding
Andrew Ding
ETH Zurich, Zurich, Switzerland
D
Dev Saini
ETH Zurich, Zurich, Switzerland
L
Lorenz Hurni
ETH Zurich, Zurich, Switzerland