Retrievals Can Be Detrimental: A Contrastive Backdoor Attack Paradigm on Retrieval-Augmented Diffusion Models

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work identifies a novel, stealthy backdoor threat against Retrieval-Augmented Diffusion Models (RDMs) following integration of Retrieval-Augmented Generation (RAG) mechanisms. To address this, we propose BadRDM—a first-of-its-kind multimodal contrastive backdoor attack framework tailored for RDMs. It achieves precise hijacking of retrieval outputs via poisoned retrieval databases and entropy-driven toxic proxy selection, while ensuring high stealth through generative data augmentation and text-trigger–image-proxy mapping modeling. Evaluated on two mainstream RDM tasks, BadRDM achieves attack success rates exceeding 92%, with benign performance degradation under 1.5%—substantially outperforming prior methods. This study constitutes the first systematic exposition of security vulnerabilities in RAG-enhanced generative models and establishes a new benchmark for multimodal backdoor defense research.

Technology Category

Application Category

📝 Abstract

Diffusion models (DMs) have recently demonstrated remarkable generation capability. However, their training generally requires huge computational resources and large-scale datasets. To solve these, recent studies empower DMs with the advanced Retrieval-Augmented Generation (RAG) technique and propose retrieval-augmented diffusion models (RDMs). By incorporating rich knowledge from an auxiliary database, RAG enhances diffusion models' generation and generalization ability while significantly reducing model parameters. Despite the great success, RAG may introduce novel security issues that warrant further investigation. In this paper, we reveal that the RDM is susceptible to backdoor attacks by proposing a multimodal contrastive attack approach named BadRDM. Our framework fully considers RAG's characteristics and is devised to manipulate the retrieved items for given text triggers, thereby further controlling the generated contents. Specifically, we first insert a tiny portion of images into the retrieval database as target toxicity surrogates. Subsequently, a malicious variant of contrastive learning is adopted to inject backdoors into the retriever, which builds shortcuts from triggers to the toxicity surrogates. Furthermore, we enhance the attacks through novel entropy-based selection and generative augmentation strategies that can derive better toxicity surrogates. Extensive experiments on two mainstream tasks demonstrate the proposed BadRDM achieves outstanding attack effects while preserving the model's benign utility.

Problem

Research questions and friction points this paper is trying to address.

RAG-enhanced Diffusion Models

Stealthy Backdoor Attacks

Contrastive Learning and Poisonous Data Injection

Innovation

Methods, ideas, or system contributions that make the work stand out.

BadRDM

Toxic Images

Malicious Contrastive Learning

🔎 Similar Papers

UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models