🤖 AI Summary
This study uncovers implicit racial/ethnic unfairness in large-scale mobile prediction models—manifested as systematic performance disparities across demographic groups. To address this, we propose Fairness-Guided Incremental Sampling (FGIS): a lightweight data intervention framework that, without access to individual sensitive attributes, generates proxy sensitive labels via Size-Aware K-Means clustering and joint MetaPath2Vec+Transformer modeling of heterogeneous mobility graphs, then enforces demographic proportionality constraints for group-aware optimization. FGIS preserves overall accuracy while reducing inter-group performance inequality by 40%, with pronounced efficacy in low-resource settings. Our key contribution is the first integration of structured demographic priors into the incremental sampling pipeline—establishing a scalable, deployable fairness-enhancement paradigm for large-scale spatiotemporal forecasting under attribute-agnostic conditions.
📝 Abstract
Next location prediction underpins a growing number of mobility, retail, and public-health applications, yet its societal impacts remain largely unexplored. In this paper, we audit state-of-the-art mobility prediction models trained on a large-scale dataset, highlighting hidden disparities based on user demographics. Drawing from aggregate census data, we compute the difference in predictive performance on racial and ethnic user groups and show a systematic disparity resulting from the underlying dataset, resulting in large differences in accuracy based on location and user groups. To address this, we propose Fairness-Guided Incremental Sampling (FGIS), a group-aware sampling strategy designed for incremental data collection settings. Because individual-level demographic labels are unavailable, we introduce Size-Aware K-Means (SAKM), a clustering method that partitions users in latent mobility space while enforcing census-derived group proportions. This yields proxy racial labels for the four largest groups in the state: Asian, Black, Hispanic, and White. Built on these labels, our sampling algorithm prioritizes users based on expected performance gains and current group representation. This method incrementally constructs training datasets that reduce demographic performance gaps while preserving overall accuracy. Our method reduces total disparity between groups by up to 40% with minimal accuracy trade-offs, as evaluated on a state-of-art MetaPath2Vec model and a transformer-encoder model. Improvements are most significant in early sampling stages, highlighting the potential for fairness-aware strategies to deliver meaningful gains even in low-resource settings. Our findings expose structural inequities in mobility prediction pipelines and demonstrate how lightweight, data-centric interventions can improve fairness with little added complexity, especially for low-data applications.