Few-Shot Segmentation of Historical Maps via Linear Probing of Vision Foundation Models

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Addressing the few-shot semantic segmentation challenge posed by stylistic diversity and scarce annotations in historical maps, this paper proposes a parameter-efficient fine-tuning paradigm based on a frozen Vision Transformer (ViT) backbone and a linear-probe adapter—training only 0.21% of parameters. Our method integrates prompt-guided feature alignment with multi-scale decoding, specifically optimized for the low-contrast and high-noise characteristics of historical cartography. On the Siegfried benchmark, it achieves ~20% mIoU gain for railway segmentation under 5-shot settings; under 10-shot, vineyard and railway segmentation outperform prior state-of-the-art by +5% and +13% mIoU, respectively. For building block segmentation on ICDAR 2021, it attains 67.3% mean Panoptic Quality (PQ). To our knowledge, this is the first work to introduce linear probing into few-shot historical map segmentation, significantly enhancing domain-shift robustness and enabling high-precision segmentation of key land features—vineyards, railways, and building blocks—with only 5–10 annotated images.

Technology Category

Application Category

📝 Abstract

As rich sources of history, maps provide crucial insights into historical changes, yet their diverse visual representations and limited annotated data pose significant challenges for automated processing. We propose a simple yet effective approach for few-shot segmentation of historical maps, leveraging the rich semantic embeddings of large vision foundation models combined with parameter-efficient fine-tuning. Our method outperforms the state-of-the-art on the Siegfried benchmark dataset in vineyard and railway segmentation, achieving +5% and +13% relative improvements in mIoU in 10-shot scenarios and around +20% in the more challenging 5-shot setting. Additionally, it demonstrates strong performance on the ICDAR 2021 competition dataset, attaining a mean PQ of 67.3% for building block segmentation, despite not being optimized for this shape-sensitive metric, underscoring its generalizability. Notably, our approach maintains high performance even in extremely low-data regimes (10- & 5-shot), while requiring only 689k trainable parameters - just 0.21% of the total model size. Our approach enables precise segmentation of diverse historical maps while drastically reducing the need for manual annotations, advancing automated processing and analysis in the field. Our implementation is publicly available at: https://github.com/RafaelSterzinger/few-shot-map-segmentation.

Problem

Research questions and friction points this paper is trying to address.

Automated segmentation of diverse historical maps with limited annotations

Improving few-shot performance for vineyard and railway segmentation

Reducing manual annotation needs for historical map analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages vision foundation models' embeddings

Uses parameter-efficient fine-tuning

Achieves high performance with few shots

🔎 Similar Papers

A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models