Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the significant performance degradation of existing input-level backdoor detection methods against semantic-preserving implicit triggers. The authors propose SET, a novel framework that discovers and leverages the Cross-attention Scaling Response Discrepancy (CSRD) phenomenon: by introducing controlled scaling perturbations to cross-attention mechanisms, it reveals systematic differences in how backdoored and benign samples evolve during the denoising process. Requiring neither prior knowledge of the attack nor access to model training, SET establishes a general-purpose input-level detection paradigm through multi-scale response shift feature extraction, few-shot modeling of benign response spaces, and unsupervised anomaly detection. Extensive experiments demonstrate that SET consistently outperforms state-of-the-art methods across diverse attack and model settings, achieving a 9.1% improvement in AUROC and a 6.5% gain in accuracy, with particularly strong performance in implicit trigger scenarios.

Technology Category

Application Category

📝 Abstract

Text-to-image (T2I) diffusion models have achieved remarkable success in image synthesis, but their reliance on large-scale data and open ecosystems introduces serious backdoor security risks. Existing defenses, particularly input-level methods, are more practical for deployment but often rely on observable anomalies that become unreliable under stealthy, semantics-preserving trigger designs. As modern backdoor attacks increasingly embed triggers into natural inputs, these methods degrade substantially, raising a critical question: can more stable, implicit, and trigger-agnostic differences between benign and backdoor inputs be exploited for detection? In this work, we address this challenge from an active probing perspective. We introduce controlled scaling perturbations on cross-attention and uncover a novel phenomenon termed Cross-Attention Scaling Response Divergence (CSRD), where benign and backdoor inputs exhibit systematically different response evolution patterns across denoising steps. Building on this insight, we propose SET, an input-level backdoor detection framework that constructs response-offset features under multi-scale perturbations and learns a compact benign response space from a small set of clean samples. Detection is then performed by measuring deviations from this learned space, without requiring prior knowledge of the attack or access to model training. Extensive experiments demonstrate that SET consistently outperforms existing baselines across diverse attack methods, trigger types, and model settings, with particularly strong gains under stealthy implicit-trigger scenarios. Overall, SET improves AUROC by 9.1% and ACC by 6.5% over the best baseline, highlighting its effectiveness and robustness for practical deployment.

Problem

Research questions and friction points this paper is trying to address.

backdoor detection

text-to-image diffusion models

stealthy triggers

input-level security

cross-attention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Attention Scaling

Backdoor Detection

Text-to-Image Diffusion Models