🤖 AI Summary
Existing CTR models struggle to effectively model multi-granularity user interests within long-term behavioral sequences and their temporal interactions. To address this, we propose a two-stage optimization framework: (1) In the first stage, multi-timescale behavioral sequences are leveraged to construct multi-granularity retrieval queries, thereby broadening interest coverage; (2) In the second stage, a novel Multi-Head Fourier Transformer is introduced to efficiently capture temporal and interactive patterns within subsequences, coupled with a Multi-Head Target Attention mechanism for adaptive fusion of multi-granularity interest representations. The framework unifies long-sequence encoding, retrieval, and fusion in an end-to-end manner. Extensive experiments on multiple public benchmarks demonstrate significant improvements over state-of-the-art methods. Online A/B testing on Huawei Music App shows a 1.32% increase in song plays and a 0.55% increase in total listening duration.
📝 Abstract
Click-through Rate (CTR) prediction is crucial for online personalization platforms. Recent advancements have shown that modeling rich user behaviors can significantly improve the performance of CTR prediction. Current long-term user behavior modeling algorithms predominantly follow two cascading stages. The first stage retrieves subsequence related to the target item from the long-term behavior sequence, while the second stage models the relationship between the subsequence and the target item. Despite significant progress, these methods have two critical flaws. First, the retrieval query typically includes only target item information, limiting the ability to capture the user's diverse interests. Second, relational information, such as sequential and interactive information within the subsequence, is frequently overlooked. Therefore, it requires to be further mined to more accurately model user interests. To this end, we propose Multi-granularity Interest Retrieval and Refinement Network (MIRRN). Specifically, we first construct queries based on behaviors observed at different time scales to obtain subsequences, each capturing users' interest at various granularities. We then introduce an noval multi-head Fourier transformer to efficiently learn sequential and interactive information within the subsequences, leading to more accurate modeling of user interests. Finally, we employ multi-head target attention to adaptively assess the impact of these multi-granularity interests on the target item. Extensive experiments have demonstrated that MIRRN significantly outperforms state-of-the-art baselines. Furthermore, an A/B test shows that MIRRN increases the average number of listening songs by 1.32% and the average time of listening songs by 0.55% on the Huawei Music App. The implementation code is publicly available at https://github.com/USTC-StarTeam/MIRRN.