🤖 AI Summary
This work addresses the challenge of personalization in mobile scenarios characterized by sparse data and dynamically evolving spatiotemporal contexts, where existing approaches struggle to simultaneously achieve rapid adaptation, noise robustness, and long-term generalization—including cold-start performance. To overcome this “impossibility triangle” of personalization, we propose U-MASK, which models user behavior as a partially observed spatiotemporal tensor and unifies short-term adaptation, long-term prediction, and cold-start handling through a user-adaptive masking mechanism. Our method introduces U-SCOPE, a compact, task-agnostic user representation, and dynamically allocates an evidence budget based on user reliability and task sensitivity. Coupled with a shared diffusion Transformer, U-MASK enables task-differentiated generation using only the mask pattern and user representation. Experiments on real-world mobile datasets demonstrate consistent and significant improvements over state-of-the-art methods across diverse settings, with the most pronounced gains under extreme data sparsity.
📝 Abstract
Personalized mobile artificial intelligence applications are widely deployed, yet they are expected to infer user behavior from sparse and irregular histories under a continuously evolving spatio-temporal context. This setting induces a fundamental tension among three requirements, i.e., immediacy to adapt to recent behavior, stability to resist transient noise, and generalization to support long-horizon prediction and cold-start users. Most existing approaches satisfy at most two of these requirements, resulting in an inherent impossibility triangle in data-scarce, non-stationary personalization. To address this challenge, we model mobile behavior as a partially observed spatio-temporal tensor and unify short-term adaptation, long-horizon forecasting, and cold-start recommendation as a conditional completion problem, where a user- and task-specific mask specifies which coordinates are treated as evidence. We propose U-MASK, a user-adaptive spatio-temporal masking method that allocates evidence budgets based on user reliability and task sensitivity. To enable mask generation under sparse observations, U-MASK learns a compact, task-agnostic user representation from app and location histories via U-SCOPE, which serves as the sole semantic conditioning signal. A shared diffusion transformer then performs mask-guided generative completion while preserving observed evidence, so personalization and task differentiation are governed entirely by the mask and the user representation. Experiments on real-world mobile datasets demonstrate consistent improvements over state-of-the-art methods across short-term prediction, long-horizon forecasting, and cold-start settings, with the largest gains under severe data sparsity. The code and dataset will be available at https://github.com/NICE-HKU/U-MASK.