🤖 AI Summary
Traditional urban indicators are often constrained by high costs, spatial inconsistency, and infrequent updates, limiting their utility for fine-grained, dynamic monitoring. This study proposes a unified supervised learning framework that integrates three Earth embedding models—AlphaEarth, Prithvi, and Clay—to predict 14 neighborhood-scale socioeconomic and behavioral metrics (including crime, income, health, and mobility) across six U.S. metropolitan areas using satellite imagery. A systematic evaluation reveals that the 64-dimensional AlphaEarth embeddings best capture built-environment characteristics, achieving superior performance in predicting chronic health burdens and commuting patterns. The models demonstrate cross-year stability; however, they face challenges in forecasting indicators influenced by localized policies or micro-level behaviors, such as cycling. Furthermore, predictive accuracy is significantly modulated by urban form, highlighting the contextual dependency of model generalizability.
📝 Abstract
Conventional urban indicators derived from censuses, surveys, and administrative records are often costly, spatially inconsistent, and slow to update. Recent geospatial foundation models enable Earth embeddings, compact satellite image representations transferable across downstream tasks, but their utility for neighborhood-scale urban monitoring remains unclear. Here, we benchmark three Earth embedding families, AlphaEarth, Prithvi, and Clay, for urban signal prediction across six U.S. metropolitan areas from 2020 to 2023. Using a unified supervised-learning framework, we predict 14 neighborhood-level indicators spanning crime, income, health, and travel behavior, and evaluate performance under four settings: global, city-wise, year-wise, and city-year. Results show that Earth embeddings capture substantial urban variation, with the highest predictive skill for outcomes more directly tied to built-environment structure, including chronic health burdens and dominant commuting modes. By contrast, indicators shaped more strongly by fine-scale behavior and local policy, such as cycling, remain difficult to infer. Predictive performance varies markedly across cities but remains comparatively stable across years, indicating strong spatial heterogeneity alongside temporal robustness. Exploratory analysis suggests that cross-city variation in predictive performance is associated with urban form in task-specific ways. Controlled dimensionality experiments show that representation efficiency is critical: compact 64-dimensional AlphaEarth embeddings remain more informative than 64-dimensional reductions of Prithvi and Clay. This study establishes a benchmark for evaluating Earth embeddings in urban remote sensing and demonstrates their potential as scalable, low-cost features for SDG-aligned neighborhood-scale urban monitoring.