🤖 AI Summary
This study addresses the persistent challenge of subnational population estimation in regions with sparse or low-resolution census data. It presents the first systematic evaluation of embedding representations from a geospatial foundation model—the Population Dynamics Foundation Model (PDFM)—as alternative covariates for multiscale population modeling across Brazil, Nigeria, and the United States. Using geographically structured validation alongside Kullback–Leibler divergence and unexplained variance metrics, the analysis demonstrates that PDFM embeddings reduce unexplained variance by 20.1% and KL divergence by 23.2% on average, with substantially greater improvements observed at larger spatial scales in less-developed regions. However, the embeddings exhibit less transferability across spatial aggregations compared to conventional handcrafted covariates. These findings elucidate both the promise and predictable limitations of foundation models in demographic estimation tasks.
📝 Abstract
Reliable subnational population estimates are essential for applications, yet remain difficult where censuses are sparse, outdated or spatially coarse. Existing population-mapping workflows rely on hand-built geospatial covariates, such as settlement extent, night-time lights, and environmental conditions, which must be assembled and harmonised across scales and geographies. Geospatial foundation models offer an alternative by learning reusable representations of place from more multifaceted and heterogeneous data sources. Here, we benchmark Population Dynamics Foundation Model (PDFM) embeddings against the harmonised geospatial covariates for subnational population estimation in Brazil, Nigeria and the United States. Under geographically structured validation, PDFM increased predictive fit by a median of 20.1% (IQR: 10.0-33.2%, across country-model comparisons) reduction in unexplained variance, and reduced Kullback-Leibler divergence by 23.2% (9.2-26.2%). However, these gains were uneven. PDFM was most advantageous where the geospatial covariates weakly characterised settlement context, such as larger and less-developed subnational areas. Moreover, PDFM performance was scale-coupled with embeddings providing less flexible transfer across spatial aggregations than geospatial covariates. These findings showed that geospatial foundation-model representations of place can improve population estimation in data poor settings, but their benefits break down predictably under spatial scale mismatch, revealing a fundamental limitation of current geospatial AI.