TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of generalizing unsupervised backdoor attack detection in multimodal large language models (MLLMs) under fine-tuning-as-a-service (FTaaS) scenarios. The authors propose a label-free, universal framework for filtering poisoned samples by uncovering a previously unrecognized fingerprint of backdoor attacks: an imbalance in attention allocation across the three core components—system instructions, visual inputs, and user text. Leveraging cross-modal attention map decomposition, the method identifies sensitive attention heads and employs Gaussian Mixture Models (GMMs) with Expectation-Maximization (EM) algorithms to statistically model and aggregate voting signals for precise detection. Extensive experiments demonstrate that the approach achieves robust and superior performance across diverse MLLM architectures and backdoor attack types, effectively enabling clean fine-tuning without requiring labeled data.

Technology Category

Application Category

📝 Abstract
Fine-Tuning-as-a-Service (FTaaS) facilitates the customization of Multimodal Large Language Models (MLLMs) but introduces critical backdoor risks via poisoned data. Existing defenses either rely on supervised signals or fail to generalize across diverse trigger types and modalities. In this work, we uncover a universal backdoor fingerprint-attention allocation divergence-where poisoned samples disrupt the balanced attention distribution across three functional components: system instructions, vision inputs, and user textual queries, regardless of trigger morphology. Motivated by this insight, we propose Tri-Component Attention Profiling (TCAP), an unsupervised defense framework to filter backdoor samples. TCAP decomposes cross-modal attention maps into the three components, identifies trigger-responsive attention heads via Gaussian Mixture Model (GMM) statistical profiling, and isolates poisoned samples through EM-based vote aggregation. Extensive experiments across diverse MLLM architectures and attack methods demonstrate that TCAP achieves consistently strong performance, establishing it as a robust and practical backdoor defense in MLLMs.
Problem

Research questions and friction points this paper is trying to address.

backdoor detection
Multimodal Large Language Models
Fine-Tuning-as-a-Service
unsupervised defense
poisoned data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-Component Attention Profiling
Unsupervised Backdoor Detection
Attention Allocation Divergence
Multimodal Large Language Models
Gaussian Mixture Model
M
Mingzu Liu
School of Control Science and Engineering, Shandong University, Jinan, China; Key Laboratory of Machine Intelligence and System Control, Ministry of Education, China
Hao Fang
Hao Fang
University of Edinburgh, School of Engineering
Deep LearningMedical ImagingInverse ProblemsElectrical Impedance TomographySoft Robotics
R
Runmin Cong
School of Control Science and Engineering, Shandong University, Jinan, China; Key Laboratory of Machine Intelligence and System Control, Ministry of Education, China