π€ AI Summary
This work addresses a critical security vulnerability in large language model (LLM) agents: during checkpoint-based recovery, inconsistent retry semantics can trigger irreversible side effects such as duplicate payments or credential misuseβa threat we term semantic rollback attacks. We formally define two novel attack classes, Action Replay and Authority Resurrection, and introduce ACRFence, a framework-agnostic defense mechanism. ACRFence ensures safe handling of irreversible operations by tracking tool invocation effects, logging execution states, and enforcing either replay or fork semantics upon recovery. We demonstrate the feasibility of these attacks through proof-of-concept experiments and validate the effectiveness of our approach, which has been acknowledged by maintainers of major LLM agent frameworks.
π Abstract
LLM agent frameworks increasingly offer checkpoint-restore for error recovery and exploration, advising developers to make external tool calls safe to retry. This advice assumes that a retried call will be identical to the original, an assumption that holds for traditional programs but fails for LLM agents, which re-synthesize subtly different requests after restore. Servers treat these re-generated requests as new, enabling duplicate payments, unauthorized reuse of consumed credentials, and other irreversible side effects; we term these semantic rollback attacks. We identify two attack classes, Action Replay and Authority Resurrection, validate them in a proof of concept experiment, and confirm that the problem has been independently acknowledged by framework maintainers. We propose ACRFence, a framework-agnostic mitigation that records irreversible tool effects and enforces replay-or-fork semantics upon restoration