Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

257K/year

🤖 AI Summary

Model merging is vulnerable to backdoor attacks, posing serious threats to deployment security. This paper first formalizes backdoor attacks as task vectors, revealing their shared structural properties and transferability in the weight-difference space, and identifying inherent trigger vulnerabilities in foundation models. We propose Sparse Backdoor Vector Fusion (SBVF), a novel attack that significantly enhances effectiveness in multi-model merging scenarios. For defense, we design Injection-based Vector Subtraction (IVS)—a lightweight, assumption-free, and general-purpose backdoor purification method. Our approach integrates task arithmetic, sparsity-aware model merging, and fine-grained weight analysis to jointly optimize attack potency and defense robustness. Experiments demonstrate that IVS achieves high detection rates and minimal accuracy degradation even under unknown backdoor types, substantially outperforming existing defenses in both efficacy and efficiency.

Technology Category

Application Category

📝 Abstract

Model merging (MM) recently emerged as an effective method for combining large deep learning models. However, it poses significant security risks. Recent research shows that it is highly susceptible to backdoor attacks, which introduce a hidden trigger into a single fine-tuned model instance that allows the adversary to control the output of the final merged model at inference time. In this work, we propose a simple framework for understanding backdoor attacks by treating the attack itself as a task vector. $Backdoor Vector (BV)$ is calculated as the difference between the weights of a fine-tuned backdoored model and fine-tuned clean model. BVs reveal new insights into attacks understanding and a more effective framework to measure their similarity and transferability. Furthermore, we propose a novel method that enhances backdoor resilience through merging dubbed $Sparse Backdoor Vector (SBV)$ that combines multiple attacks into a single one. We identify the core vulnerability behind backdoor threats in MM: $inherent triggers$ that exploit adversarial weaknesses in the base model. To counter this, we propose $Injection BV Subtraction (IBVS)$ - an assumption-free defense against backdoors in MM. Our results show that SBVs surpass prior attacks and is the first method to leverage merging to improve backdoor effectiveness. At the same time, IBVS provides a lightweight, general defense that remains effective even when the backdoor threat is entirely unknown.

Problem

Research questions and friction points this paper is trying to address.

Understanding backdoor attacks through task arithmetic vectors

Enhancing backdoor resilience via sparse vector merging

Developing assumption-free defense against backdoor threats

Innovation

Methods, ideas, or system contributions that make the work stand out.

Backdoor Vector framework analyzes attacks via task arithmetic

Sparse Backdoor Vector merges multiple attacks for enhanced effectiveness

Injection BV Subtraction provides assumption-free defense against backdoors

🔎 Similar Papers

Persistent Backdoor Attacks in Continual Learning