Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study systematically evaluates the practical capabilities of large language models (LLMs) in enterprise network penetration testing, with a focus on automated attack scenarios within simulated Microsoft Active Directory environments. By constructing a reproducible experimental framework that integrates multiple LLM interfaces and custom evaluation scripts, this work provides the first empirical validation of LLMs’ feasibility in executing complex penetration tasks within realistic enterprise red-team simulations. The research successfully reproduces and extends prior findings, demonstrating that certain LLMs can effectively conduct penetration activities under specific conditions. Furthermore, the authors open-source the complete toolchain and experimental pipeline, significantly enhancing reproducibility and standardization in AI-driven security research.

Technology Category

Application Category

📝 Abstract

This is the Replicated Computational Results (RCR) Report for the paper ``Can LLMs Hack Enterprise Networks?" The paper empirically investigates the efficacy and effectiveness of different LLMs for penetration-testing enterprise networks, i.e., Microsoft Active Directory Assumed-Breach Simulations. This RCR report describes the artifacts used in the paper, how to create an evaluation setup, and highlights the analysis scripts provided within our prototype.

Problem

Research questions and friction points this paper is trying to address.

LLMs

penetration testing

enterprise networks

Active Directory

assumed-breach simulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Penetration Testing

Active Directory