🤖 AI Summary
Existing user interest modeling approaches suffer from two key limitations: (1) treating behavioral sequences as noise-free preference signals, thereby ignoring pervasive observational noise; and (2) generating static, context-agnostic interest representations, failing to capture dynamic user intent. To address these, we propose the Context-Aware Denoising Diffusion Framework (CADF), the first to introduce conditional denoising diffusion into user interest modeling—enabling a paradigm shift from conventional “identify-and-aggregate” to controllable purification. CADF leverages query-user-item-context quaternary interaction features to guide both forward noise injection and conditional reverse denoising, while integrating category-aware behavioral sequence filtering for robust sequence purification. Consequently, it dynamically generates context-sensitive interest representations. Extensive offline evaluations and large-scale online A/B tests demonstrate that CADF consistently outperforms state-of-the-art methods, achieving significant improvements in click-through rate prediction performance.
📝 Abstract
User behavior sequences in search systems resemble "interest fossils", capturing genuine intent yet eroded by exposure bias, category drift, and contextual noise. Current methods predominantly follow an "identify-aggregate" paradigm, assuming sequences immutably reflect user preferences while overlooking the organic entanglement of noise and genuine interest. Moreover, they output static, context-agnostic representations, failing to adapt to dynamic intent shifts under varying Query-User-Item-Context conditions.
To resolve this dual challenge, we propose the Contextual Diffusion Purifier (CDP). By treating category-filtered behaviors as "contaminated observations", CDP employs a forward noising and conditional reverse denoising process guided by cross-interaction features (Query x User x Item x Context), controllably generating pure, context-aware interest representations that dynamically evolve with scenarios. Extensive offline/online experiments demonstrate the superiority of CDP over state-of-the-art methods.