Exploring Backdoor Attack and Defense for LLM-empowered Recommendations

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work first systematically exposes the severe security vulnerability of backdoor attacks against large language model–enhanced recommender systems (LLM-RecSys). To address the threat where injecting triggers into item titles manipulates recommendation outputs, we propose BadRec—a novel end-to-end backdoor attack framework that poisons only 1% of training data by perturbing item titles and injecting fake user interactions. Concurrently, we design P-Scanner, a general-purpose defense method leveraging LLMs’ semantic understanding and a trigger-augmented proxy to detect poisoned samples. Our contributions include: (i) the first transferable backdoor attack and defense framework specifically for LLM-RecSys; and (ii) a trigger-augmented proxy that enhances LLMs’ domain-aware detection of poisoning patterns. Experiments demonstrate BadRec’s high attack success rate and show that P-Scanner achieves an average detection accuracy of 92.7% across three real-world datasets, significantly outperforming conventional baselines.

Technology Category

Application Category

📝 Abstract

The fusion of Large Language Models (LLMs) with recommender systems (RecSys) has dramatically advanced personalized recommendations and drawn extensive attention. Despite the impressive progress, the safety of LLM-based RecSys against backdoor attacks remains largely under-explored. In this paper, we raise a new problem: Can a backdoor with a specific trigger be injected into LLM-based Recsys, leading to the manipulation of the recommendation responses when the backdoor trigger is appended to an item's title? To investigate the vulnerabilities of LLM-based RecSys under backdoor attacks, we propose a new attack framework termed Backdoor Injection Poisoning for RecSys (BadRec). BadRec perturbs the items' titles with triggers and employs several fake users to interact with these items, effectively poisoning the training set and injecting backdoors into LLM-based RecSys. Comprehensive experiments reveal that poisoning just 1% of the training data with adversarial examples is sufficient to successfully implant backdoors, enabling manipulation of recommendations. To further mitigate such a security threat, we propose a universal defense strategy called Poison Scanner (P-Scanner). Specifically, we introduce an LLM-based poison scanner to detect the poisoned items by leveraging the powerful language understanding and rich knowledge of LLMs. A trigger augmentation agent is employed to generate diverse synthetic triggers to guide the poison scanner in learning domain-specific knowledge of the poisoned item detection task. Extensive experiments on three real-world datasets validate the effectiveness of the proposed P-Scanner.

Problem

Research questions and friction points this paper is trying to address.

Investigating backdoor attack vulnerabilities in LLM-based recommender systems

Proposing BadRec framework to inject backdoors via poisoned training data

Developing P-Scanner defense to detect poisoned items using LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes BadRec for backdoor attacks on LLM RecSys

Uses 1% poisoned data to manipulate recommendations

Introduces P-Scanner with LLM for poison detection

🔎 Similar Papers

A Survey of Recent Backdoor Attacks and Defenses in Large Language Models