Creating a Historical Migration Dataset from Finnish Church Records, 1800-1920

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of transforming unstructured, handwritten church migration records from Finland (1800–1920) into structured demographic data for historical population analysis. The corpus comprises 200,000 archival pages and over 6 million individual migration entries—previously inaccessible to quantitative research due to their handwritten, non-digitized nature. Method: We propose the first end-to-end deep learning pipeline for historical handwritten documents, integrating layout analysis, table detection, cell classification, and handwritten text recognition (HTR) to achieve fully automated, large-scale digitization and structural extraction. Contribution/Results: The resulting spatiotemporally aligned, open-access database achieves state-of-the-art accuracy in historical document processing. Applied to the Elimäki parish case study, it enables integrated micro–macro analyses—including urbanization patterns, family migration chains, and disease diffusion—thereby establishing a reusable methodological framework and infrastructural resource for digital humanities and historical population geography.

Technology Category

Application Category

📝 Abstract
This article presents a large-scale effort to create a structured dataset of internal migration in Finland between 1800 and 1920 using digitized church moving records. These records, maintained by Evangelical-Lutheran parishes, document the migration of individuals and families and offer a valuable source for studying historical demographic patterns. The dataset includes over six million entries extracted from approximately 200,000 images of handwritten migration records. The data extraction process was automated using a deep learning pipeline that included layout analysis, table detection, cell classification, and handwriting recognition. The complete pipeline was applied to all images, resulting in a structured dataset suitable for research. The dataset can be used to study internal migration, urbanization, and family migration, and the spread of disease in preindustrial Finland. A case study from the Elim""aki parish shows how local migration histories can be reconstructed. The work demonstrates how large volumes of handwritten archival material can be transformed into structured data to support historical and demographic research.
Problem

Research questions and friction points this paper is trying to address.

Creating structured dataset from Finnish church migration records
Automating extraction of handwritten records using deep learning
Studying historical migration, urbanization, and disease spread
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated deep learning pipeline for data extraction
Processed 200,000 handwritten record images
Generated structured dataset for migration research
🔎 Similar Papers
No similar papers found.
A
Ari Vesalainen
University of Helsinki, Department of Computer Science, Helsinki, Finland
Jenna Kanerva
Jenna Kanerva
Department of Computing, University of Turku
Natural Language ProcessingMachine Learning
A
Aida Nitsch
University of Turku, Department of Biology, Turku, Finland
K
Kiia Korsu
University of Turku, Department of Biology, Turku, Finland
I
Ilari Larkiola
University of Turku, Department of Computing, TurkuNLP, Turku, Finland
Laura Ruotsalainen
Laura Ruotsalainen
Professor, Spatiotemporal Data Analysis, Dept. of Computer Science, University of Helsinki
#UnivHelsinkiCSNavigationComputer vision
Filip Ginter
Filip Ginter
University of Turku
language technologynatural language processing