🤖 AI Summary
Historical map datasets suffer from narrow spatiotemporal coverage and prohibitively high annotation costs, hindering nationwide, long-term land-use change analysis. To address this, we introduce the first weakly supervised semantic segmentation dataset for historical maps covering the 18th–20th centuries across mainland France (548,000 km²). We propose a novel cross-century, multi-style weakly supervised segmentation paradigm, featuring a CycleGAN-based style alignment method that enables training on historical maps using only modern ground-truth labels—eliminating the need for costly historical annotations. Leveraging U-Net and other deep learning architectures, our approach integrates domain-specific digitization protocols and annotation guidelines. Evaluated on 22,878 km² of manually annotated areas, our weakly supervised model achieves 86% of the mIoU attained by its fully supervised counterpart, drastically reducing annotation effort. This work enables, for the first time, quantitative, century-scale landscape change analysis at national resolution.
📝 Abstract
Historical maps offer an invaluable perspective into territory evolution across past centuries--long before satellite or remote sensing technologies existed. Deep learning methods have shown promising results in segmenting historical maps, but publicly available datasets typically focus on a single map type or period, require extensive and costly annotations, and are not suited for nationwide, long-term analyses. In this paper, we introduce a new dataset of historical maps tailored for analyzing large-scale, long-term land use and land cover evolution with limited annotations. Spanning metropolitan France (548,305 km^2), our dataset contains three map collections from the 18th, 19th, and 20th centuries. We provide both comprehensive modern labels and 22,878 km^2 of manually annotated historical labels for the 18th and 19th century maps. Our dataset illustrates the complexity of the segmentation task, featuring stylistic inconsistencies, interpretive ambiguities, and significant landscape changes (e.g., marshlands disappearing in favor of forests). We assess the difficulty of these challenges by benchmarking three approaches: a fully-supervised model trained with historical labels, and two weakly-supervised models that rely only on modern annotations. The latter either use the modern labels directly or first perform image-to-image translation to address the stylistic gap between historical and contemporary maps. Finally, we discuss how these methods can support long-term environment monitoring, offering insights into centuries of landscape transformation. Our official project repository is publicly available at https://github.com/Archiel19/FRAx4.git.