Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing Profile Pollution Attacks (PPAs) on sequential recommendation systems suffer from two key limitations: (1) over-reliance on coarse-grained sequence-level perturbations, hindering fine-grained manipulation at the item-transition level; and (2) global profile modifications that induce substantial distributional shifts, increasing detectability. This paper proposes a highly stealthy and precise fine-grained user behavior poisoning attack. Methodologically, we introduce a bi-level optimization framework integrated with a multi-reward reinforcement learning (RL) mechanism—combining pattern-inversion rewards and distributional consistency constraints. We further propose a grouped relative RL paradigm with dynamic barrier constraints, coupled with imbalanced cooperative optimal transport to enable step-wise perturbation optimization and experience replay enhancement. Experiments demonstrate substantial improvements in attack success rates across multiple state-of-the-art sequential recommenders, while reducing detection rates by 30–52% compared to existing PPAs—establishing new performance benchmarks.

Technology Category

Application Category

📝 Abstract

Sequential Recommenders, which exploit dynamic user intents through interaction sequences, is vulnerable to adversarial attacks. While existing attacks primarily rely on data poisoning, they require large-scale user access or fake profiles thus lacking practicality. In this paper, we focus on the Profile Pollution Attack that subtly contaminates partial user interactions to induce targeted mispredictions. Previous PPA methods suffer from two limitations, i.e., i) over-reliance on sequence horizon impact restricts fine-grained perturbations on item transitions, and ii) holistic modifications cause detectable distribution shifts. To address these challenges, we propose a constrained reinforcement driven attack CREAT that synergizes a bi-level optimization framework with multi-reward reinforcement learning to balance adversarial efficacy and stealthiness. We first develop a Pattern Balanced Rewarding Policy, which integrates pattern inversion rewards to invert critical patterns and distribution consistency rewards to minimize detectable shifts via unbalanced co-optimal transport. Then we employ a Constrained Group Relative Reinforcement Learning paradigm, enabling step-wise perturbations through dynamic barrier constraints and group-shared experience replay, achieving targeted pollution with minimal detectability. Extensive experiments demonstrate the effectiveness of CREAT.

Problem

Research questions and friction points this paper is trying to address.

Proposes stealthy profile pollution attacks on sequential recommender systems

Addresses limitations of detectable distribution shifts in existing methods

Balances attack effectiveness with stealthiness using constrained reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bi-level optimization framework balances attack efficacy

Pattern Balanced Rewarding Policy inverts critical patterns

Constrained Group Reinforcement Learning minimizes detection

🔎 Similar Papers

Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation