🤖 AI Summary
This work addresses the challenge of detecting and defending against data poisoning attacks in machine learning. Methodologically, it introduces a general diagnostic framework grounded in spectral properties of the input Hessian matrix. Theoretically, it establishes the first random matrix theory characterizing how poisoning ratio and regularization strength jointly govern attack efficacy—extending classical linear regression analysis to deep architectures including CNNs and Transformers. It further reveals that the Hessian spectrum serves as a universal, retraining-free poisoning indicator and proposes an efficient QR-stepwise regression algorithm for both detection and model repair. Empirically, the approach demonstrates strong robustness across diverse models and loss functions—including cross-entropy—significantly enhancing model security and trustworthiness. By enabling interpretable, low-overhead defense against data contamination, this work establishes a novel paradigm for poisoning resilience.
📝 Abstract
We investigate the theoretical foundations of data poisoning attacks in machine learning models. Our analysis reveals that the Hessian with respect to the input serves as a diagnostic tool for detecting poisoning, exhibiting spectral signatures that characterize compromised datasets. We use random matrix theory (RMT) to develop a theory for the impact of poisoning proportion and regularisation on attack efficacy in linear regression. Through QR stepwise regression, we study the spectral signatures of the Hessian in multi-output regression. We perform experiments on deep networks to show experimentally that this theory extends to modern convolutional and transformer networks under the cross-entropy loss. Based on these insights we develop preliminary algorithms to determine if a network has been poisoned and remedies which do not require further training.