π€ AI Summary
Existing global migration datasets lack annual resolution, country-of-birth granularity, and temporal coverage spanning recent decades, hindering fine-grained analysis of migration dynamics.
Method: We propose the first deep recurrent neural network framework tailored for long-term global migration modeling, integrating 18 geospatial, socioeconomic, and demographic covariates. The architecture incorporates uncertainty propagation and ensemble inference to yield interpretable prediction intervals.
Contribution/Results: We construct a globally comprehensive, annually resolved migration flow and stock dataset covering 230 countries and territories from 1990 to 2023βthe first of its kind with birth-country disaggregation. Our model significantly outperforms conventional five-year interval estimates in both accuracy and timeliness on held-out data. All data, source code, and trained model weights are fully open-sourced, establishing a reproducible, extensible foundational resource for migration research.
π Abstract
We present a novel and detailed dataset on origin-destination annual migration flows and stocks between 230 countries and regions, spanning the period from 1990 to the present. Our flow estimates are further disaggregated by country of birth, providing a comprehensive picture of migration over the last 43 years. The estimates are obtained by training a deep recurrent neural network to learn flow patterns from 18 covariates for all countries, including geographic, economic, cultural, societal, and political information. The recurrent architecture of the neural network means that the entire past can influence current migration patterns, allowing us to learn long-range temporal correlations. By training an ensemble of neural networks and additionally pushing uncertainty on the covariates through the trained network, we obtain confidence bounds for all our estimates, allowing researchers to pinpoint the geographic regions most in need of additional data collection. We validate our approach on various test sets of unseen data, demonstrating that it significantly outperforms traditional methods estimating five-year flows while delivering a significant increase in temporal resolution. The model is fully open source: all training data, neural network weights, and training code are made public alongside the migration estimates, providing a valuable resource for future studies of human migration.