🤖 AI Summary
In urban environments, coarse GPS localization errors (2–20 m) combined with high spatial density of points of interest (POIs)—often exceeding 50 within a 100-meter radius—lead to ambiguous attribution of user visits. To address this, we propose a multi-source signal fusion framework for POI visit attribution. Our method introduces the first end-to-end Transformer-based architecture that jointly models individual spatiotemporal trajectories, sequential visit context (preceding and succeeding POIs), and collective behavioral patterns. It integrates kernel density estimation (KDE), fine-grained spatiotemporal feature encoding, and POI semantic embeddings. This design effectively mitigates noise interference and disambiguates visits among proximal, semantically similar POIs. Evaluated on a large-scale real-world dataset, our approach achieves an average accuracy improvement of 12.7% over state-of-the-art methods. Notably, it demonstrates superior robustness under severe GPS noise and high POI overlap—critical challenges in dense urban settings.
📝 Abstract
Accurately attributing user visits to specific Points of Interest (POIs) is a foundational task for mobility analytics, personalized services, marketing and urban planning. However, POI attribution remains challenging due to GPS inaccuracies, typically ranging from 2 to 20 meters in real-world settings, and the high spatial density of POIs in urban environments, where multiple venues can coexist within a small radius (e.g., over 50 POIs within a 100-meter radius in dense city centers). Relying on proximity is therefore often insufficient for determining which POI was actually visited. We introduce extsf{POIFormer}, a novel Transformer-based framework for accurate and efficient POI attribution. Unlike prior approaches that rely on limited spatiotemporal, contextual, or behavioral features, extsf{POIFormer} jointly models a rich set of signals, including spatial proximity, visit timing and duration, contextual features from POI semantics, and behavioral features from user mobility and aggregated crowd behavior patterns--using the Transformer's self-attention mechanism to jointly model complex interactions across these dimensions. By leveraging the Transformer to model a user's past and future visits (with the current visit masked) and incorporating crowd-level behavioral patterns through pre-computed KDEs, extsf{POIFormer} enables accurate, efficient attribution in large, noisy mobility datasets. Its architecture supports generalization across diverse data sources and geographic contexts while avoiding reliance on hard-to-access or unavailable data layers, making it practical for real-world deployment. Extensive experiments on real-world mobility datasets demonstrate significant improvements over existing baselines, particularly in challenging real-world settings characterized by spatial noise and dense POI clustering.