When Critics Disagree: Adaptive Reward Poisoning Attacks in RIS-Aided Wireless Control System

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the vulnerability of learning-based wireless control systems in reconfigurable intelligent surface (RIS)-assisted cognitive radio networks to reward poisoning attacks, which can mislead deep reinforcement learning (DRL) agents—such as those using Soft Actor-Critic (SAC)—into suboptimal policies. The paper proposes a novel adaptive reward poisoning attack named Divergence-Guided Reward Poisoning (DGRP), which leverages the divergence between dual critics’ outputs as a trigger mechanism to inject malicious rewards precisely during states of high uncertainty. This approach exhibits strong targeting capability and high stealthiness, effectively undermining the performance gains offered by RIS. Experimental results demonstrate that DGRP outperforms baseline strategies such as periodic and exploration-triggered attacks, thereby exposing a new security vulnerability inherent in DRL-driven RIS systems.

📝 Abstract

Reward-poisoning attacks present a significant risk to learning-based wireless control systems. Given this, we propose a Disagreement-Guided Reward Poisoning (DGRP) adaptive attack on a Soft Actor-Critic (SAC) agent. In a Cognitive Radio Network (CRN) environment assisted by Reconfigurable Intelligent Surfaces (RIS), the SAC agent is tasked with maximizing the long-term secondary users' (SUs) rate by simultaneously optimizing the transmission power of the SU transmitter and the RIS phase shifts. DGRP corrupts rewards, particularly when the SAC dual critics exhibit substantial disagreement-especially in high-leverage, high-uncertainty states-resulting in distorted value estimations and guiding the policy towards suboptimal actions. Our findings demonstrate that DGRP substantially diminishes the performance improvements typically provided by RIS and degrades transmission quality. We further investigate key attack parameters and determine their impact on learning. In comparison to periodic-timing and exploration-triggered baselines, DGRP consistently causes greater damage, highlighting the necessity of considering disagreement-aware threats when evaluating the robustness of Deep Reinforcement Learning (DRL) in RIS-assisted networks.

Problem

Research questions and friction points this paper is trying to address.

reward poisoning

RIS-aided wireless control

Deep Reinforcement Learning

critic disagreement

adversarial attack

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward Poisoning

Disagreement-Guided Attack

Reconfigurable Intelligent Surface (RIS)