Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning

📅 2025-02-23

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This paper addresses the robustness challenge posed by dynamic adaptive data poisoning attacks in online learning—where adversaries observe model updates and adaptively adjust poisoning strategies in real time. We propose the first provably robust certification framework for such settings. Unlike prior static poisoning analyses, our framework establishes the first certifiable robustness theory tailored to dynamic online learning, supporting strong adaptive adversary modeling. Methodologically, we develop tight certification bounds via sensitivity analysis and robust statistical inference, instantiating the framework for mean estimation and binary classification tasks, with natural extensibility to general supervised learning. Experiments demonstrate that our algorithm maintains high accuracy and stability against strong adaptive attacks on standard benchmarks, while yielding tight certified bounds. The implementation is publicly released.

Technology Category

Application Category

📝 Abstract

The rise of foundation models fine-tuned on human feedback from potentially untrusted users has increased the risk of adversarial data poisoning, necessitating the study of robustness of learning algorithms against such attacks. Existing research on provable certified robustness against data poisoning attacks primarily focuses on certifying robustness for static adversaries who modify a fraction of the dataset used to train the model before the training algorithm is applied. In practice, particularly when learning from human feedback in an online sense, adversaries can observe and react to the learning process and inject poisoned samples that optimize adversarial objectives better than when they are restricted to poisoning a static dataset once, before the learning algorithm is applied. Indeed, it has been shown in prior work that online dynamic adversaries can be significantly more powerful than static ones. We present a novel framework for computing certified bounds on the impact of dynamic poisoning, and use these certificates to design robust learning algorithms. We give an illustration of the framework for the mean estimation and binary classification problems and outline directions for extending this in further work. The code to implement our certificates and replicate our results is available at https://github.com/Avinandan22/Certified-Robustness.

Problem

Research questions and friction points this paper is trying to address.

Certify robustness against dynamic data poisoning

Design robust learning algorithms for adaptive adversaries

Compute certified bounds for dynamic poisoning impact

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic poisoning certification

Robust learning algorithms

Online adaptive adversaries

🔎 Similar Papers

Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models