Adversarial Resilience against Clean-Label Attacks in Realizable and Noisy Settings

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses sequential learning under streaming i.i.d. data contaminated by an unknown number of “clean-label adversarial examples”—inputs perturbed to mislead the model while retaining correct labels—and allows cost-free prediction rejection when confidence is low. We propose a threshold-based learning framework grounded in the disagreement region, the first to formally model clean-label attacks in the agnostic noise setting and provide theoretically rigorous robustness analysis—correcting flawed arguments previously made under the realizable assumption. Our method integrates disagreement-driven hypothesis selection, randomized rejection, and adversarially robust statistical learning theory. We prove that it simultaneously controls both misclassification and rejection rates under both realizable and agnostic settings, yielding a provably sublinear regret bound. Empirically and theoretically, it significantly enhances robustness against clean-label adversarial attacks.

Technology Category

Application Category

📝 Abstract

We investigate the challenge of establishing stochastic-like guarantees when sequentially learning from a stream of i.i.d. data that includes an unknown quantity of clean-label adversarial samples. We permit the learner to abstain from making predictions when uncertain. The regret of the learner is measured in terms of misclassification and abstention error, where we allow the learner to abstain for free on adversarial injected samples. This approach is based on the work of Goel, Hanneke, Moran, and Shetty from arXiv:2306.13119. We explore the methods they present and manage to correct inaccuracies in their argumentation. However, this approach is limited to the realizable setting, where labels are assigned according to some function $f^*$ from the hypothesis space $mathcal{F}$. Based on similar arguments, we explore methods to make adaptations for the agnostic setting where labels are random. Introducing the notion of a clean-label adversary in the agnostic context, we are the first to give a theoretical analysis of a disagreement-based learner for thresholds, subject to a clean-label adversary with noise.

Problem

Research questions and friction points this paper is trying to address.

Analyzing clean-label adversarial attacks in realizable and noisy settings

Measuring learner regret via misclassification and abstention errors

Extending theoretical analysis to agnostic settings with noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequential learning with abstention option

Clean-label adversary in agnostic setting

Disagreement-based learner for noisy thresholds

🔎 Similar Papers

Diffusion Denoising as a Certified Defense against Clean-label Poisoning