Stealthy LLM-Driven Data Poisoning Attacks Against Embedding-Based Retrieval-Augmented Recommender Systems

๐Ÿ“… 2025-05-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work exposes a stealthy provider-side data poisoning attack against retrieval-augmented recommendation (RAG-based) systems: an attacker can significantly elevate rankings of long-tail itemsโ€”or suppress head itemsโ€”by perturbing fewer than 1% of tokens in item descriptions (e.g., injecting sentiment words or substituting semantically similar phrases). We formally define a novel attack model constrained by semantic similarity, enabling imperceptible token-level edits. Our method leverages LLM-powered semantic retrieval, embedding-space-guided keyword injection, and bounded perturbations to preserve textual fluency and evade detection. Experiments on MovieLens demonstrate an attack success rate of 83.6%, with robust evasion of standard anomaly detectors. The results reveal that RAG recommenders are critically vulnerable to minute metadata manipulations, underscoring the urgent need for rigorous text consistency verification and provenance-aware auditing mechanisms.

Technology Category

Application Category

๐Ÿ“ Abstract
We present a systematic study of provider-side data poisoning in retrieval-augmented recommender systems (RAG-based). By modifying only a small fraction of tokens within item descriptions -- for instance, adding emotional keywords or borrowing phrases from semantically related items -- an attacker can significantly promote or demote targeted items. We formalize these attacks under token-edit and semantic-similarity constraints, and we examine their effectiveness in both promotion (long-tail items) and demotion (short-head items) scenarios. Our experiments on MovieLens, using two large language model (LLM) retrieval modules, show that even subtle attacks shift final rankings and item exposures while eluding naive detection. The results underscore the vulnerability of RAG-based pipelines to small-scale metadata rewrites and emphasize the need for robust textual consistency checks and provenance tracking to thwart stealthy provider-side poisoning.
Problem

Research questions and friction points this paper is trying to address.

Stealthy LLM-driven data poisoning in RAG-based recommender systems
Promoting or demoting items via small token edits in descriptions
Vulnerability of RAG pipelines to metadata manipulation attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven subtle token modifications for poisoning
Token-edit and semantic-similarity constrained attacks
Metadata rewrites to manipulate item rankings
๐Ÿ”Ž Similar Papers
No similar papers found.