🤖 AI Summary
This work identifies a novel backdoor vulnerability in third-party task vectors (TVs) of pretrained models under task arithmetic operations—including addition, subtraction, and analogy—where malicious functionality persists across compositional task manipulations. We propose BadTV, a targeted attack framework that jointly optimizes compact clean-task vectors with composite backdoor injections to achieve robust, cross-operation implantation. BadTV is the first to systematically expose critical security blind spots in TV composition, achieving near-100% attack success across diverse models (e.g., BERT, T5) and multitask settings. Crucially, it evades all existing TV-level defenses. Our findings reveal severe security risks in the emerging TV-as-a-service ecosystem and establish a foundational benchmark for evaluating TV robustness. Moreover, this work introduces a new paradigm for secure task vector learning, emphasizing compositional safety as a core requirement in modular transfer learning.
📝 Abstract
Task arithmetic in large-scale pre-trained models enables flexible adaptation to diverse downstream tasks without extensive re-training. By leveraging task vectors (TVs), users can perform modular updates to pre-trained models through simple arithmetic operations like addition and subtraction. However, this flexibility introduces new security vulnerabilities. In this paper, we identify and evaluate the susceptibility of TVs to backdoor attacks, demonstrating how malicious actors can exploit TVs to compromise model integrity. By developing composite backdoors and eliminating redudant clean tasks, we introduce BadTV, a novel backdoor attack specifically designed to remain effective under task learning, forgetting, and analogies operations. Our extensive experiments reveal that BadTV achieves near-perfect attack success rates across various scenarios, significantly impacting the security of models using task arithmetic. We also explore existing defenses, showing that current methods fail to detect or mitigate BadTV. Our findings highlight the need for robust defense mechanisms to secure TVs in real-world applications, especially as TV services become more popular in machine-learning ecosystems.