ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training

📅 2025-11-01

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

While data poisoning and backdoor attacks against CLIP have been extensively studied on the image modality, textual-side threats remain systematically unexplored—leading to challenges in generating semantically coherent and contextually consistent adversarial texts due to semantic misalignment and background inconsistency. Method: This paper presents ToxicTextCLIP, the first systematic framework for attacking CLIP via poisoned text inputs. It employs a contrastive learning–based, background-aware selector and enhancer to iteratively generate malicious texts that are semantically coherent, contextually aligned, and highly stealthy. Semantic consistency constraints and diversity control further refine attack efficacy. Results: Experiments demonstrate state-of-the-art performance: up to 95.83% poisoning success rate in classification and 98.68% Hit@1 for backdoor attacks in cross-modal retrieval. Crucially, ToxicTextCLIP evades prominent defenses—including RoCLIP, CleanCLIP, and SafeCLIP—highlighting its practical threat significance.

Technology Category

Application Category

📝 Abstract

The Contrastive Language-Image Pretraining (CLIP) model has significantly advanced vision-language modeling by aligning image-text pairs from large-scale web data through self-supervised contrastive learning. Yet, its reliance on uncurated Internet-sourced data exposes it to data poisoning and backdoor risks. While existing studies primarily investigate image-based attacks, the text modality, which is equally central to CLIP's training, remains underexplored. In this work, we introduce ToxicTextCLIP, a framework for generating high-quality adversarial texts that target CLIP during the pre-training phase. The framework addresses two key challenges: semantic misalignment caused by background inconsistency with the target class, and the scarcity of background-consistent texts. To this end, ToxicTextCLIP iteratively applies: 1) a background-aware selector that prioritizes texts with background content aligned to the target class, and 2) a background-driven augmenter that generates semantically coherent and diverse poisoned samples. Extensive experiments on classification and retrieval tasks show that ToxicTextCLIP achieves up to 95.83% poisoning success and 98.68% backdoor Hit@1, while bypassing RoCLIP, CleanCLIP and SafeCLIP defenses. The source code can be accessed via https://github.com/xinyaocse/ToxicTextCLIP/.

Problem

Research questions and friction points this paper is trying to address.

Addresses text-based poisoning and backdoor attacks on CLIP pre-training

Generates adversarial texts causing semantic misalignment in CLIP

Overcomes background inconsistency and scarcity in poisoned text generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates adversarial texts for CLIP poisoning

Uses background-aware selector for target alignment

Applies background-driven augmenter for diverse samples

🔎 Similar Papers

No similar papers found.