🤖 AI Summary
This work identifies a previously overlooked backdoor vulnerability in text-to-video (T2V) generation models, stemming from redundant spatiotemporal information—such as unprompted backgrounds or secondary objects—that attackers can exploit covertly. To address this, we propose the first adversarial backdoor attack framework tailored for T2V models. Our method employs two core strategies: (1) spatiotemporal feature composition encoding and dynamic redundancy element transformation, enabling cross-frame temporal stealthy triggering that evades frame-level spatial moderation; and (2) prompt-aligned adversarial target injection with temporal robustness optimization, ensuring generated videos remain semantically faithful to the input prompt and visually natural. Evaluated on multiple state-of-the-art T2V models, our attack achieves >92% success rate, induces negligible video degradation (FVD increase <1.5), and preserves original model performance—effectively bypassing existing frame-wise content moderation systems.
📝 Abstract
Text-to-video (T2V) generative models have rapidly advanced and found widespread applications across fields like entertainment, education, and marketing. However, the adversarial vulnerabilities of these models remain rarely explored. We observe that in T2V generation tasks, the generated videos often contain substantial redundant information not explicitly specified in the text prompts, such as environmental elements, secondary objects, and additional details, providing opportunities for malicious attackers to embed hidden harmful content. Exploiting this inherent redundancy, we introduce BadVideo, the first backdoor attack framework tailored for T2V generation. Our attack focuses on designing target adversarial outputs through two key strategies: (1) Spatio-Temporal Composition, which combines different spatiotemporal features to encode malicious information; (2) Dynamic Element Transformation, which introduces transformations in redundant elements over time to convey malicious information. Based on these strategies, the attacker's malicious target seamlessly integrates with the user's textual instructions, providing high stealthiness. Moreover, by exploiting the temporal dimension of videos, our attack successfully evades traditional content moderation systems that primarily analyze spatial information within individual frames. Extensive experiments demonstrate that BadVideo achieves high attack success rates while preserving original semantics and maintaining excellent performance on clean inputs. Overall, our work reveals the adversarial vulnerability of T2V models, calling attention to potential risks and misuse. Our project page is at https://wrt2000.github.io/BadVideo2025/.