When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

This work investigates backdoor attacks against regularized generalized linear models (GLMs) trained on Gaussian mixture data in high-dimensional settings, revealing a non-monotonic relationship between trigger strength and attack efficacy: attack success initially increases with trigger strength but subsequently declines. Theoretical analysis identifies the most adversarial trigger direction as aligning with the smallest eigenvector of the data covariance matrix and uncovers a finite-sample noise-floor mechanism overlooked by classical high-dimensional theory. These findings are extended to general convex GLM losses via closed-form solutions under squared loss and a Gaussian surrogate fixed-point system. The theoretical predictions demonstrate strong generalizability, validated empirically on Gaussian models, CIFAR-10, and even non-convex ResNet-18 architectures.

📝 Abstract

Backdoor poisoning attacks behave counter-intuitively in high dimensions: stronger training triggers can help the defender. We study regularised generalised linear models on Gaussian-mixture data in the proportional regime ($p/n \to κ$), varying the training trigger strength $α$ against a fixed test trigger. Three phenomena emerge: (i) clean test accuracy increases with $α$; (ii) attack success peaks at a finite $α$ and then declines; and (iii) the most damaging trigger direction is the minimum eigenvector of the data covariance. We prove all three results in closed form for the squared loss, and extend (i) and (ii) to general convex GLM losses via a Gaussian-proxy fixed-point system. We identify a finite-sample noise floor proportional to $κ$ as the mechanism behind (i), invisible to classical $n \gg p$ analysis. Experiments on CIFAR-10 and Gaussian surrogates match the theory closely; ResNet-18 experiments show the same phenomena beyond the convex setting.

Problem

Research questions and friction points this paper is trying to address.

backdoor attacks

high-dimensional

trigger strength

poisoning attacks

data covariance

Innovation

Methods, ideas, or system contributions that make the work stand out.

backdoor attacks

high-dimensional statistics

trigger strength