Retrievals Can Be Detrimental: A Contrastive Backdoor Attack Paradigm on Retrieval-Augmented Diffusion Models

📅 2025-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a novel, stealthy backdoor threat against Retrieval-Augmented Diffusion Models (RDMs) following integration of Retrieval-Augmented Generation (RAG) mechanisms. To address this, we propose BadRDM—a first-of-its-kind multimodal contrastive backdoor attack framework tailored for RDMs. It achieves precise hijacking of retrieval outputs via poisoned retrieval databases and entropy-driven toxic proxy selection, while ensuring high stealth through generative data augmentation and text-trigger–image-proxy mapping modeling. Evaluated on two mainstream RDM tasks, BadRDM achieves attack success rates exceeding 92%, with benign performance degradation under 1.5%—substantially outperforming prior methods. This study constitutes the first systematic exposition of security vulnerabilities in RAG-enhanced generative models and establishes a new benchmark for multimodal backdoor defense research.

Technology Category

Application Category

📝 Abstract
Diffusion models (DMs) have recently demonstrated remarkable generation capability. However, their training generally requires huge computational resources and large-scale datasets. To solve these, recent studies empower DMs with the advanced Retrieval-Augmented Generation (RAG) technique and propose retrieval-augmented diffusion models (RDMs). By incorporating rich knowledge from an auxiliary database, RAG enhances diffusion models' generation and generalization ability while significantly reducing model parameters. Despite the great success, RAG may introduce novel security issues that warrant further investigation. In this paper, we reveal that the RDM is susceptible to backdoor attacks by proposing a multimodal contrastive attack approach named BadRDM. Our framework fully considers RAG's characteristics and is devised to manipulate the retrieved items for given text triggers, thereby further controlling the generated contents. Specifically, we first insert a tiny portion of images into the retrieval database as target toxicity surrogates. Subsequently, a malicious variant of contrastive learning is adopted to inject backdoors into the retriever, which builds shortcuts from triggers to the toxicity surrogates. Furthermore, we enhance the attacks through novel entropy-based selection and generative augmentation strategies that can derive better toxicity surrogates. Extensive experiments on two mainstream tasks demonstrate the proposed BadRDM achieves outstanding attack effects while preserving the model's benign utility.
Problem

Research questions and friction points this paper is trying to address.

RAG-enhanced Diffusion Models
Stealthy Backdoor Attacks
Contrastive Learning and Poisonous Data Injection
Innovation

Methods, ideas, or system contributions that make the work stand out.

BadRDM
Toxic Images
Malicious Contrastive Learning
H
Hao Fang
Tsinghua Shenzhen International Graduate School, Tsinghua University
X
Xiaohang Sui
Tsinghua Shenzhen International Graduate School, Tsinghua University
Hongyao Yu
Hongyao Yu
Tsinghua University
machine learningcomputer visionAI security
Jiawei Kong
Jiawei Kong
Tsinghua University
Trustworthy AI
Sijin Yu
Sijin Yu
South China University of Technology
neural decodingcomputer visiondeep learningartificial intelligence
B
Bin Chen
Harbin Institute of Technology, Shenzhen
H
Hao Wu
Harbin Institute of Technology, Shenzhen
Shu-Tao Xia
Shu-Tao Xia
SIGS, Tsinghua University
coding and information theorymachine learningcomputer visionAI security