Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This paper identifies, for the first time, time-of-check-to-time-of-use (TOCTOU) vulnerabilities in large language model (LLM)-based intelligent agents—where external state (e.g., files, API responses) is verified by the agent but maliciously altered before subsequent use, enabling configuration hijacking or payload injection attacks. Method: To address this gap, the authors introduce TOCTOU-Bench, the first dedicated benchmark comprising 66 realistic tasks, and propose a detection and mitigation framework integrating prompt rewriting, runtime state integrity monitoring, and toolchain-coordinated verification—adapting classical system security mechanisms to the LLM agent paradigm. Contribution/Results: Experiments demonstrate a 25% TOCTOU vulnerability detection accuracy, a 3% reduction in vulnerable plan generation, a 95% shrinkage of the attack window, and an overall vulnerability rate decrease from 12% to 8%.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM)-enabled agents are rapidly emerging across a wide range of applications, but their deployment introduces vulnerabilities with security implications. While prior work has examined prompt-based attacks (e.g., prompt injection) and data-oriented threats (e.g., data exfiltration), time-of-check to time-of-use (TOCTOU) remain largely unexplored in this context. TOCTOU arises when an agent validates external state (e.g., a file or API response) that is later modified before use, enabling practical attacks such as malicious configuration swaps or payload injection. In this work, we present the first study of TOCTOU vulnerabilities in LLM-enabled agents. We introduce TOCTOU-Bench, a benchmark with 66 realistic user tasks designed to evaluate this class of vulnerabilities. As countermeasures, we adapt detection and mitigation techniques from systems security to this setting and propose prompt rewriting, state integrity monitoring, and tool-fusing. Our study highlights challenges unique to agentic workflows, where we achieve up to 25% detection accuracy using automated detection methods, a 3% decrease in vulnerable plan generation, and a 95% reduction in the attack window. When combining all three approaches, we reduce the TOCTOU vulnerabilities from an executed trajectory from 12% to 8%. Our findings open a new research direction at the intersection of AI safety and systems security.

Problem

Research questions and friction points this paper is trying to address.

Identifying TOCTOU vulnerabilities in LLM-enabled agents

Studying state validation flaws before execution in AI systems

Proposing countermeasures for time-based security gaps

Innovation

Methods, ideas, or system contributions that make the work stand out.

TOCTOU-Bench benchmark for vulnerability evaluation

Prompt rewriting and state integrity monitoring

Tool-fusing to reduce attack window

🔎 Similar Papers

The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies