🤖 AI Summary
While data poisoning and backdoor attacks against CLIP have been extensively studied on the image modality, textual-side threats remain systematically unexplored—leading to challenges in generating semantically coherent and contextually consistent adversarial texts due to semantic misalignment and background inconsistency.
Method: This paper presents ToxicTextCLIP, the first systematic framework for attacking CLIP via poisoned text inputs. It employs a contrastive learning–based, background-aware selector and enhancer to iteratively generate malicious texts that are semantically coherent, contextually aligned, and highly stealthy. Semantic consistency constraints and diversity control further refine attack efficacy.
Results: Experiments demonstrate state-of-the-art performance: up to 95.83% poisoning success rate in classification and 98.68% Hit@1 for backdoor attacks in cross-modal retrieval. Crucially, ToxicTextCLIP evades prominent defenses—including RoCLIP, CleanCLIP, and SafeCLIP—highlighting its practical threat significance.
📝 Abstract
The Contrastive Language-Image Pretraining (CLIP) model has significantly advanced vision-language modeling by aligning image-text pairs from large-scale web data through self-supervised contrastive learning. Yet, its reliance on uncurated Internet-sourced data exposes it to data poisoning and backdoor risks. While existing studies primarily investigate image-based attacks, the text modality, which is equally central to CLIP's training, remains underexplored. In this work, we introduce ToxicTextCLIP, a framework for generating high-quality adversarial texts that target CLIP during the pre-training phase. The framework addresses two key challenges: semantic misalignment caused by background inconsistency with the target class, and the scarcity of background-consistent texts. To this end, ToxicTextCLIP iteratively applies: 1) a background-aware selector that prioritizes texts with background content aligned to the target class, and 2) a background-driven augmenter that generates semantically coherent and diverse poisoned samples. Extensive experiments on classification and retrieval tasks show that ToxicTextCLIP achieves up to 95.83% poisoning success and 98.68% backdoor Hit@1, while bypassing RoCLIP, CleanCLIP and SafeCLIP defenses. The source code can be accessed via https://github.com/xinyaocse/ToxicTextCLIP/.