Quantifying Frontier LLM Capabilities for Container Sandbox Escape

📅 2026-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the emerging security threat posed by state-of-the-art large language models (LLMs) operating as autonomous agents within containerized sandbox environments, where they may exploit system vulnerabilities to achieve escape. To systematically evaluate this risk, we introduce SANDBOXESCAPEBENCH—the first comprehensive benchmark encompassing four categories of escape scenarios: misconfigurations, privilege abuse, kernel flaws, and runtime weaknesses. Employing a nested sandbox architecture and a CTF-inspired evaluation paradigm, our framework safely quantifies the escape capabilities of LLMs granted shell access. Experiments conducted using the Inspect AI framework with Docker/OCI containers demonstrate that current LLMs can effectively identify and exploit real-world vulnerabilities to escape confinement, underscoring the critical role of this benchmark in assessing and enhancing the deployment security of LLM-based agents.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) increasingly act as autonomous agents, using tools to execute code, read and write files, and access networks, creating novel security risks. To mitigate these risks, agents are commonly deployed and evaluated in isolated "sandbox" environments, often implemented using Docker/OCI containers. We introduce SANDBOXESCAPEBENCH, an open benchmark that safely measures an LLM's capacity to break out of these sandboxes. The benchmark is implemented as an Inspect AI Capture the Flag (CTF) evaluation utilising a nested sandbox architecture with the outer layer containing the flag and no known vulnerabilities. Following a threat model of a motivated adversarial agent with shell access inside a container, SANDBOXESCAPEBENCH covers a spectrum of sandboxescape mechanisms spanning misconfiguration, privilege allocation mistakes, kernel flaws, and runtime/orchestration weaknesses. We find that, when vulnerabilities are added, LLMs are able to identify and exploit them, showing that use of evaluation like SANDBOXESCAPEBENCH is needed to ensure sandboxing continues to provide the encapsulation needed for highly-capable models.
Problem

Research questions and friction points this paper is trying to address.

LLM
sandbox escape
container security
AI safety
adversarial agent
Innovation

Methods, ideas, or system contributions that make the work stand out.

sandbox escape
large language models
container security
adversarial evaluation
nested sandbox
🔎 Similar Papers
No similar papers found.
R
Rahul Marchand
University of Oxford
A
Art O Cathain
UK AI Security Institute
J
Jerome Wynne
UK AI Security Institute
P
Philippos Maximos Giavridis
UK AI Security Institute
S
Sam Deverett
UK AI Security Institute
J
John Wilkinson
UK AI Security Institute
J
Jason Gwartz
UK AI Security Institute
Harry Coppock
Harry Coppock
Imperial College London
Deep LearningSignal ProcessingAudioRepresentation LearningQuantisation