Using Text-Based Life Trajectories from Swedish Register Data to Predict Residential Mobility with Pretrained Transformers

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses two key challenges in administrative registration data: high-cardinality categorical variables and inconsistent cross-temporal coding. To tackle these, it introduces the first sociological modeling paradigm that textualizes individual life trajectories. Leveraging multidimensional administrative records (residence, employment, education, income, family) for 6.9 million Swedish residents from 2001–2013, the study constructs long-term, structured life-course text sequences to predict residential mobility during 2013–2017. It pioneers the application of pretrained language models (BERT, DistilBERT, Qwen) and LSTM to longitudinal sociological forecasting, demonstrating empirically that Transformer-based architectures significantly outperform traditional methods; textual representation effectively preserves path-dependency information. The work delivers a scalable, reproducible benchmark platform for sequence modeling, advancing the rigorous integration of large-scale administrative data into social science research.

Technology Category

Application Category

📝 Abstract
We transform large-scale Swedish register data into textual life trajectories to address two long-standing challenges in data analysis: high cardinality of categorical variables and inconsistencies in coding schemes over time. Leveraging this uniquely comprehensive population register, we convert register data from 6.9 million individuals (2001-2013) into semantically rich texts and predict individuals'residential mobility in later years (2013-2017). These life trajectories combine demographic information with annual changes in residence, work, education, income, and family circumstances, allowing us to assess how effectively such sequences support longitudinal prediction. We compare multiple NLP architectures (including LSTM, DistilBERT, BERT, and Qwen) and find that sequential and transformer-based models capture temporal and semantic structure more effectively than baseline models. The results show that textualized register data preserves meaningful information about individual pathways and supports complex, scalable modeling. Because few countries maintain longitudinal microdata with comparable coverage and precision, this dataset enables analyses and methodological tests that would be difficult or impossible elsewhere, offering a rigorous testbed for developing and evaluating new sequence-modeling approaches. Overall, our findings demonstrate that combining semantically rich register data with modern language models can substantially advance longitudinal analysis in social sciences.
Problem

Research questions and friction points this paper is trying to address.

Predict residential mobility using textual life trajectories from register data
Address high cardinality and coding inconsistencies in longitudinal data analysis
Evaluate NLP models for capturing temporal and semantic structure in sequences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transform register data into textual life trajectories
Use pretrained transformers to predict residential mobility
Compare NLP architectures for longitudinal sequence modeling
🔎 Similar Papers
No similar papers found.
P
Philipp Stark
Department of Human Geography (KEG), Lund University, Sweden
Alexandros Sopasakis
Alexandros Sopasakis
Lund University
Hybrid Stochastic / Machine learning / Monte Carlo SystemsNoise driven problems & SPDEsNonlinear Kinetic EquationsTraffic
O
Ola Hall
Department of Human Geography (KEG), Lund University, Sweden
M
Markus Grillitsch
Department of Human Geography (KEG), Lund University, Sweden