🤖 AI Summary
Current large language model (LLM) agent frameworks lack reliable state memory management, often leading to issues such as state loss, bypassed refreshes, and destructive overwrites. This work proposes ClawVM—a virtual memory abstraction embedded within the agent harness layer—that introduces, for the first time, a deterministic and auditable state management mechanism. Its core components include typed state pages, multi-resolution representations, minimal fidelity invariants, and lifecycle-boundary-aware write-back validation. Experimental results demonstrate that, under practical token budget constraints, ClawVM completely eliminates policy-controllable failures across synthetic workloads, twelve real-world conversational trajectories, and adversarial test scenarios, with per-turn policy overhead consistently below 50 microseconds.
📝 Abstract
Stateful tool-using LLM agents treat the context window as working memory, yet today's agent harnesses manage residency and durability as best-effort, causing recurring failures: lost state after compaction, bypassed flushes on reset, and destructive writeback. We present \textsc{ClawVM}, a virtual memory layer that manages state as typed pages with minimum-fidelity invariants, multi-resolution representations under a token budget, and validated writeback at every lifecycle boundary. Because the harness already assembles prompts, mediates tools, and observes lifecycle events, it is the natural enforcement point; placing the contract there makes residency and durability deterministic and auditable. Across synthetic workloads, 12 real-session traces, and adversarial stress tests, \textsc{ClawVM} eliminates all policy-controllable faults whenever the minimum-fidelity set fits within the token budget, confirmed by an offline oracle, and adds median <50 microseconds of policy-engine overhead per turn.