๐ค AI Summary
This paper addresses the robustness challenge posed by dynamic adaptive data poisoning attacks in online learningโwhere adversaries observe model updates and adaptively adjust poisoning strategies in real time. We propose the first provably robust certification framework for such settings. Unlike prior static poisoning analyses, our framework establishes the first certifiable robustness theory tailored to dynamic online learning, supporting strong adaptive adversary modeling. Methodologically, we develop tight certification bounds via sensitivity analysis and robust statistical inference, instantiating the framework for mean estimation and binary classification tasks, with natural extensibility to general supervised learning. Experiments demonstrate that our algorithm maintains high accuracy and stability against strong adaptive attacks on standard benchmarks, while yielding tight certified bounds. The implementation is publicly released.
๐ Abstract
The rise of foundation models fine-tuned on human feedback from potentially untrusted users has increased the risk of adversarial data poisoning, necessitating the study of robustness of learning algorithms against such attacks. Existing research on provable certified robustness against data poisoning attacks primarily focuses on certifying robustness for static adversaries who modify a fraction of the dataset used to train the model before the training algorithm is applied. In practice, particularly when learning from human feedback in an online sense, adversaries can observe and react to the learning process and inject poisoned samples that optimize adversarial objectives better than when they are restricted to poisoning a static dataset once, before the learning algorithm is applied. Indeed, it has been shown in prior work that online dynamic adversaries can be significantly more powerful than static ones. We present a novel framework for computing certified bounds on the impact of dynamic poisoning, and use these certificates to design robust learning algorithms. We give an illustration of the framework for the mean estimation and binary classification problems and outline directions for extending this in further work. The code to implement our certificates and replicate our results is available at https://github.com/Avinandan22/Certified-Robustness.