Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the vulnerability of neural contextual bandits to adversarial context poisoning attacks, where minimal perturbations can induce suboptimal decisions. It formulates such attacks for the first time as a continuous-armed bandit problem and introduces a black-box adaptive attack framework that operates without access to the victim’s internal parameters. The framework employs maximum entropy inverse reinforcement learning to construct a surrogate model and optimizes the perturbation strategy via an upper confidence bound-aware Gaussian process combined with projected gradient descent, while incorporating an attack budget constraint. Theoretical analysis establishes that the attacker achieves sublinear regret, whereas the victim suffers a linear regret lower bound. Experiments on Yelp, MovieLens, and Disin datasets demonstrate a significant increase in the victim’s cumulative regret, outperforming existing baselines and confirming the method’s effectiveness and novelty.

Technology Category

Application Category

📝 Abstract

Neural contextual bandits are vulnerable to adversarial attacks, where subtle perturbations to rewards, actions, or contexts induce suboptimal decisions. We introduce AdvBandit, a black-box adaptive attack that formulates context poisoning as a continuous-armed bandit problem, enabling the attacker to jointly learn and exploit the victim's evolving policy. The attacker requires no access to the victim's internal parameters, reward function, or gradient information; instead, it constructs a surrogate model using a maximum-entropy inverse reinforcement learning module from observed context-action pairs and optimizes perturbations against this surrogate using projected gradient descent. An upper confidence bound-aware Gaussian process guides arm selection. An attack-budget control mechanism is also introduced to limit detection risk and overhead. We provide theoretical guarantees, including sublinear attacker regret and lower bounds on victim regret linear in the number of attacks. Experiments on three real-world datasets (Yelp, MovieLens, and Disin) against various victim contextual bandits demonstrate that our attack model achieves higher cumulative victim regret than state-of-the-art baselines.

Problem

Research questions and friction points this paper is trying to address.

adversarial attack

context poisoning

neural contextual bandits

black-box attack

bandit problem

Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial attack

contextual bandits

black-box attack