Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation

๐Ÿ“… 2024-09-11
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenge of jointly modeling long-term user perceptions and short-term interest dynamics in list-wise recommendation, this paper proposes mccHRL, a hierarchical reinforcement learning framework: a high-level agent captures cross-session evolution of user perceptions, while a low-level agent models sequential item selection within each session. mccHRL is the first to systematically incorporate temporal abstraction into list-wise recommendation, explicitly decoupling cross-session and intra-session contextsโ€”thereby mitigating both policy-space explosion and user feedback sparsity. Extensive experiments on both simulated environments and industrial-scale real-world datasets demonstrate that mccHRL significantly outperforms state-of-the-art baselines. The code and datasets are publicly released.

Technology Category

Application Category

๐Ÿ“ Abstract
Modern listwise recommendation systems need to consider both long-term user perceptions and short-term interest shifts. Reinforcement learning can be applied on recommendation to study such a problem but is also subject to large search space, sparse user feedback and long interactive latency. Motivated by recent progress in hierarchical reinforcement learning, we propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation. Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy by modeling the process as a sequential decision-making problem. We argue that such framework has a well-defined decomposition of the outra-session context and the intra-session context, which are encoded by the high-level and low-level agents, respectively. To verify this argument, we implement both a simulator-based environment and an industrial dataset-based experiment. Results observe significant performance improvement by our method, compared with several well-known baselines. Data and codes have been made public.
Problem

Research questions and friction points this paper is trying to address.

Address long-term and short-term user interests in recommendations
Reduce search space and sparse feedback in reinforcement learning
Decompose user context into session-level and item-level decisions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical reinforcement learning for temporal abstraction
High-level agent models user perception evolution
Low-level agent handles item selection policy
๐Ÿ”Ž Similar Papers
No similar papers found.