Non-omniscient backdoor injection with a single poison sample: Proving the one-poison hypothesis for linear regression and linear classification

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work investigates whether backdoor attacks can achieve zero-error injection with only a single poisoned sample and minimal background knowledge—termed the “one-poison hypothesis”—without degrading primary task performance. For linear regression and classification models, we formally establish and rigorously prove the feasibility of single-sample backdoor injection. Leveraging statistical backdoor learning theory, we characterize the “effective” and “ineffective” directions of the poisoned sample in parameter space, revealing how it manipulates model behavior without requiring access to the full training dataset or gradients. Theoretical analysis demonstrates that such an attack preserves primary-task accuracy nearly unchanged while inducing functionally equivalent backdoor behavior. Extensive experiments across multiple benchmark datasets validate both the theoretical guarantees and the robustness of the proposed approach.

Technology Category

Application Category

📝 Abstract

Backdoor injection attacks are a threat to machine learning models that are trained on large data collected from untrusted sources; these attacks enable attackers to inject malicious behavior into the model that can be triggered by specially crafted inputs. Prior work has established bounds on the success of backdoor attacks and their impact on the benign learning task, however, an open question is what amount of poison data is needed for a successful backdoor attack. Typical attacks either use few samples, but need much information about the data points or need to poison many data points. In this paper, we formulate the one-poison hypothesis: An adversary with one poison sample and limited background knowledge can inject a backdoor with zero backdooring-error and without significantly impacting the benign learning task performance. Moreover, we prove the one-poison hypothesis for linear regression and linear classification. For adversaries that utilize a direction that is unused by the benign data distribution for the poison sample, we show that the resulting model is functionally equivalent to a model where the poison was excluded from training. We build on prior work on statistical backdoor learning to show that in all other cases, the impact on the benign learning task is still limited. We also validate our theoretical results experimentally with realistic benchmark data sets.

Problem

Research questions and friction points this paper is trying to address.

Determine minimal poison data needed for successful backdoor attacks

Prove one-poison hypothesis for linear models with limited knowledge

Assess impact of single poison sample on benign task performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single poison sample for backdoor attack

Zero backdooring-error with limited knowledge

Functional equivalence in benign model performance

🔎 Similar Papers

Mellivora Capensis: A Backdoor-Free Training Framework on the Poisoned Dataset without Auxiliary Data