๐ค AI Summary
Existing RL-based multi-task fusion (MTF) recommendation methods model only user features, neglecting item and contextual information. To address this limitation, we propose a user-item joint enhancement paradigm for state representation, whichโ for the first timeโunifies users, items, and heterogeneous auxiliary features into a fine-grained joint state. Building upon this enriched state space, we design a tailored Actor-Critic architecture and employ a PPO variant for end-to-end optimization. The proposed method has operated stably in large-scale online services for over six months, achieving significant improvements in user effective consumption (+3.84%) and session dwell time (+0.58%). Offline evaluations demonstrate superior AUC and NDCG performance compared to state-of-the-art RL-MTF approaches. These results validate that joint state modeling is critical for enhancing multi-task recommendation performance.
๐ Abstract
As the last key stage of Recommender Systems (RSs), Multi-Task Fusion (MTF) is in charge of combining multiple scores predicted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which decides the ultimate recommendation results. In recent years, to maximize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is widely used for MTF in large-scale RSs. However, limited by their modeling pattern, all the current RL-MTF methods can only utilize user features as the state to generate actions for each user, but unable to make use of item features and other valuable features, which leads to suboptimal results. Addressing this problem is a challenge that requires breaking through the current modeling pattern of RL-MTF. To solve this problem, we propose a novel method called Enhanced-State RL for MTF in RSs. Unlike the existing methods mentioned above, our method first defines user features, item features, and other valuable features collectively as the enhanced state; then proposes a novel actor and critic learning process to utilize the enhanced state to make much better action for each user-item pair. To the best of our knowledge, this novel modeling pattern is being proposed for the first time in the field of RL-MTF. We conduct extensive offline and online experiments in a large-scale RS. The results demonstrate that our model outperforms other models significantly. Enhanced-State RL has been fully deployed in our RS more than half a year, improving +3.84% user valid consumption and +0.58% user duration time compared to baseline.