Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing fair recommendation methods often struggle to balance accuracy and fairness because they mistakenly treat observed interactions—contaminated by popularity and exposure biases—as genuine user preferences. This work identifies this limitation as a failure in state estimation and proposes a Denoising State Representation Module (DSRM) based on diffusion models to recover users’ true latent states. To further disentangle long-term fairness from short-term utility, the approach integrates hierarchical reinforcement learning (HRL). Evaluated in high-fidelity simulation environments KuaiRec and KuaiRand, the method effectively disrupts the “rich-get-richer” feedback loop and achieves a superior Pareto frontier between recommendation utility and exposure fairness.

Technology Category

Application Category

📝 Abstract
Interactive recommender systems (IRS) are increasingly optimized with Reinforcement Learning (RL) to capture the sequential nature of user-system dynamics. However, existing fairness-aware methods often suffer from a fundamental oversight: they assume the observed user state is a faithful representation of true preferences. In reality, implicit feedback is contaminated by popularity-driven noise and exposure bias, creating a distorted state that misleads the RL agent. We argue that the persistent conflict between accuracy and fairness is not merely a reward-shaping issue, but a state estimation failure. In this work, we propose \textbf{DSRM-HRL}, a framework that reformulates fairness-aware recommendation as a latent state purification problem followed by decoupled hierarchical decision-making. We introduce a Denoising State Representation Module (DSRM) based on diffusion models to recover the low-entropy latent preference manifold from high-entropy, noisy interaction histories. Built upon this purified state, a Hierarchical Reinforcement Learning (HRL) agent is employed to decouple conflicting objectives: a high-level policy regulates long-term fairness trajectories, while a low-level policy optimizes short-term engagement under these dynamic constraints. Extensive experiments on high-fidelity simulators (KuaiRec, KuaiRand) demonstrate that DSRM-HRL effectively breaks the "rich-get-richer" feedback loop, achieving a superior Pareto frontier between recommendation utility and exposure equity.
Problem

Research questions and friction points this paper is trying to address.

fairness
state estimation
exposure bias
interactive recommendation
preference distortion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent State Purification
Diffusion Models
Hierarchical Reinforcement Learning
Exposure Bias Mitigation
Fairness-Aware Recommendation
Y
Yun Lu
Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences; Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
Xiaoyu Shi
Xiaoyu Shi
MMLab, The Chinese University of Hong Kong
computer vision
Hong Xie
Hong Xie
University of Science and Technology of China (USTC)
Data Science/MiningOnline Learning
Xiangyu Zhao
Xiangyu Zhao
Associate Professor, City University of Hong Kong
RecommendationsLarge Language Models (LLMs)TrustworthyAISearch EngineUrban Computing
M
Mingsheng Shang
Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences; Chongqing School, University of Chinese Academy of Sciences, Chongqing, China