When Can You Poison Rewards? A Tight Characterization of Reward Poisoning in Linear MDPs

📅 2026-04-11

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

This work investigates the vulnerability of linear Markov decision processes (MDPs) to reward poisoning attacks, wherein an adversary manipulates rewards within a limited budget to steer a reinforcement learning agent toward executing a target policy. We establish, for the first time, necessary and sufficient conditions under which a linear MDP can be successfully attacked, precisely delineating the boundary between instances that are inherently robust and those susceptible to manipulation. The theoretical framework is further extended to deep reinforcement learning settings by leveraging linear approximation techniques, enabling effective analysis in complex environments. Empirical evaluations demonstrate that the proposed approach not only accurately identifies attackable instances but also executes attacks efficiently, thereby validating both the rigor and practical relevance of the theoretical results.

Technology Category

Application Category

📝 Abstract

We study reward poisoning attacks in reinforcement learning (RL), where an adversary manipulates rewards within constrained budgets to force the target RL agent to adopt a policy that aligns with the attacker's objectives. Prior works on reward poisoning mainly focused on sufficient conditions to design a successful attacker, while only a few studies discussed the infeasibility of targeted attacks. This paper provides the first precise necessity and sufficiency characterization of the attackability of a linear MDP under reward poisoning attacks. Our characterization draws a bright line between the vulnerable RL instances, and the intrinsically robust ones which cannot be attacked without large costs even running vanilla non-robust RL algorithms. Our theory extends beyond linear MDPs -- by approximating deep RL environments as linear MDPs, we show that our theoretical framework effectively distinguishes the attackability and efficiently attacks the vulnerable ones, demonstrating both the theoretical and practical significance of our characterization.

Problem

Research questions and friction points this paper is trying to address.

reward poisoning

linear MDPs

attackability

reinforcement learning

adversarial attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

reward poisoning

linear MDPs

attackability characterization