SoK: The Pitfalls of Deep Reinforcement Learning for Cybersecurity

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This study addresses the persistent challenges in applying deep reinforcement learning (DRL) to cybersecurity (DRL4Sec), which often fail to translate into practical solutions due to methodological flaws. Through a systematic analysis of 66 published works and three controlled experimental studies—spanning autonomous cyber defense, adversarial malware generation, and web security testing—the authors identify and categorize 11 common methodological pitfalls, quantifying their prevalence and real-world impact. The findings reveal that papers in this domain exhibit an average of more than five such pitfalls, significantly degrading system performance. This work provides the first comprehensive taxonomy of methodological issues in DRL4Sec and offers actionable recommendations to guide future research toward more rigorous and reproducible practices.

Technology Category

Application Category

📝 Abstract

Deep Reinforcement Learning (DRL) has achieved remarkable success in domains requiring sequential decision-making, motivating its application to cybersecurity problems. However, transitioning DRL from laboratory simulations to bespoke cyber environments can introduce numerous issues. This is further exacerbated by the often adversarial, non-stationary, and partially-observable nature of most cybersecurity tasks. In this paper, we identify and systematize 11 methodological pitfalls that frequently occur in DRL for cybersecurity (DRL4Sec) literature across the stages of environment modeling, agent training, performance evaluation, and system deployment. By analyzing 66 significant DRL4Sec papers (2018-2025), we quantify the prevalence of each pitfall and find an average of over five pitfalls per paper. We demonstrate the practical impact of these pitfalls using controlled experiments in (i) autonomous cyber defense, (ii) adversarial malware creation, and (iii) web security testing environments. Finally, we provide actionable recommendations for each pitfall to support the development of more rigorous and deployable DRL-based security systems.

Problem

Research questions and friction points this paper is trying to address.

Deep Reinforcement Learning

Cybersecurity

Methodological Pitfalls

Non-stationary Environments

Partial Observability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Reinforcement Learning

Cybersecurity

Methodological Pitfalls