๐ค AI Summary
This study systematically evaluates the practical capabilities of large language models (LLMs) in enterprise network penetration testing, with a focus on automated attack scenarios within simulated Microsoft Active Directory environments. By constructing a reproducible experimental framework that integrates multiple LLM interfaces and custom evaluation scripts, this work provides the first empirical validation of LLMsโ feasibility in executing complex penetration tasks within realistic enterprise red-team simulations. The research successfully reproduces and extends prior findings, demonstrating that certain LLMs can effectively conduct penetration activities under specific conditions. Furthermore, the authors open-source the complete toolchain and experimental pipeline, significantly enhancing reproducibility and standardization in AI-driven security research.
๐ Abstract
This is the Replicated Computational Results (RCR) Report for the paper ``Can LLMs Hack Enterprise Networks?" The paper empirically investigates the efficacy and effectiveness of different LLMs for penetration-testing enterprise networks, i.e., Microsoft Active Directory Assumed-Breach Simulations. This RCR report describes the artifacts used in the paper, how to create an evaluation setup, and highlights the analysis scripts provided within our prototype.