🤖 AI Summary
To address state uncertainty arising from sensor measurement errors and heterogeneous actuation delays in wireless communications, this paper proposes a multi-timescale robust decision-making framework. Unlike conventional single-timescale optimization and idealized perception assumptions, we design two novel algorithms: PGD-DDQN—trained against worst-case perturbations via projected gradient descent—and NQC-DDQN—which incorporates nonlinear Q-value compression to mitigate action aliasing. Our approach integrates projection-based adversarial training, robust optimization, and multi-scale temporal modeling. Experimental results demonstrate that the proposed methods achieve performance close to the ideal-sensing baseline across diverse sensor perturbations, significantly outperforming existing deep reinforcement learning and classical interference-resilient approaches. The framework exhibits strong robustness to sensing uncertainties and practical deployability in real-world wireless systems.
📝 Abstract
Owing to the openness of wireless channels, wireless communication systems are highly susceptible to malicious jamming. Most existing anti-jamming methods rely on the assumption of accurate sensing and optimize parameters on a single timescale. However, such methods overlook two practical issues: mismatched execution latencies across heterogeneous actions and measurement errors caused by sensor imperfections. Especially for deep reinforcement learning (DRL)-based methods, the inherent sensitivity of neural networks implies that even minor perturbations in the input can mislead the agent into choosing suboptimal actions, with potentially severe consequences. To ensure reliable wireless transmission, we establish a multi-timescale decision model that incorporates state uncertainty. Subsequently, we propose two robust schemes that sustain performance under bounded sensing errors. First, a Projected Gradient Descent-assisted Double Deep Q-Network (PGD-DDQN) algorithm is designed, which derives worst-case perturbations under a norm-bounded error model and applies PGD during training for robust optimization. Second, a Nonlinear Q-Compression DDQN (NQC-DDQN) algorithm introduces a nonlinear compression mechanism that adaptively contracts Q-value ranges to eliminate action aliasing. Simulation results indicate that, compared with the perfect-sensing baseline, the proposed algorithms show only minor degradation in anti-jamming performance while maintaining robustness under various perturbations, thereby validating their practicality in imperfect sensing conditions.