CaveAgent: Transforming LLMs into Stateful Runtime Operators

πŸ“… 2026-01-04
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current LLM-based agents are constrained by a purely textual paradigm, rendering them susceptible to context drift and fragile multi-turn dependencies in long-horizon tasks. This work proposes CaveAgent, a novel framework that introduces a stateful runtime mechanism, transforming the LLM from a text generator into a state-aware operator. By employing a dual-stream context architecture, CaveAgent decouples semantic reasoning from deterministic Python execution and enables cross-turn persistence and manipulation of complex objects. This approach transcends traditional text-binding limitations, effectively mitigating context drift and catastrophic forgetting. Evaluated on benchmarks such as TauΒ²-bench and BFCL, CaveAgent achieves substantial gains: a 10.5% improvement in retail task success rate, 28.4% reduction in total token consumption across multi-turn scenarios, and up to 59% fewer tokens in data-intensive tasks, while successfully handling large-scale data that causes context overflow in competing methods.

Technology Category

Application Category

πŸ“ Abstract
LLM-based agents are increasingly capable of complex task execution, yet current agentic systems remain constrained by text-centric paradigms. Traditional approaches rely on procedural JSON-based function calling, which often struggles with long-horizon tasks due to fragile multi-turn dependencies and context drift. In this paper, we present CaveAgent, a framework that transforms the paradigm from"LLM-as-Text-Generator"to"LLM-as-Runtime-Operator."We introduce a Dual-stream Context Architecture that decouples state management into a lightweight semantic stream for reasoning and a persistent, deterministic Python Runtime stream for execution. In addition to leveraging code generation to efficiently resolve interdependent sub-tasks (e.g., loops, conditionals) in a single step, we introduce \textit{Stateful Runtime Management} in CaveAgent. Distinct from existing code-based approaches that remain text-bound and lack the support for external object injection and retrieval, CaveAgent injects, manipulates, and retrieves complex Python objects (e.g., DataFrames, database connections) that persist across turns. This persistence mechanism acts as a high-fidelity external memory to eliminate context drift, avoid catastrophic forgetting, while ensuring that processed data flows losslessly to downstream applications. Comprehensive evaluations on Tau$^2$-bench, BFCL and various case studies across representative SOTA LLMs demonstrate CaveAgent's superiority. Specifically, our framework achieves a 10.5\% success rate improvement on retail tasks and reduces total token consumption by 28.4\% in multi-turn scenarios. On data-intensive tasks, direct variable storage and retrieval reduces token consumption by 59\%, allowing CaveAgent to handle large-scale data that causes context overflow failures in both JSON-based and Code-based agents.
Problem

Research questions and friction points this paper is trying to address.

LLM-based agents
context drift
stateful runtime
long-horizon tasks
external memory
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stateful Runtime Management
Dual-stream Context Architecture
LLM-as-Runtime-Operator
Persistent Python Objects
Context Drift Elimination
πŸ”Ž Similar Papers
No similar papers found.
M
Maohao Ran
HKUST
Z
Zhenglin Wan
NUS
C
Cooper Lin
HKU
Yanting Zhang
Yanting Zhang
Donghua University
H
Hongyu Xin
HKUST
Hongwei Fan
Hongwei Fan
Peking University
Robotics3D Vision
Y
Yibo Xu
HKUST
B
Beier Luo
HKU
Y
Yaxin Zhou
CMU
Wangbo Zhao
Wangbo Zhao
National University of Singapore
Efficient Deep LearningDynamic Neural NetworkMultimodal Model
L
Lijie Yang
Princeton
Lang Feng
Lang Feng
Nanyang Technological University
Reinforcement Learning
F
Fuchao Yang
NTU, Singapore
J
Jingxuan Wu
UNC, Chapel hill
Y
Yiqiao Huang
Harvard
C
Chendong Ma
HKBU
D
Dailing Jiang
HKBU
J
Jianbo Deng
HKUST
S
Sihui Han
HKUST
Bo An
Bo An
Nanyang Technological University
Artificial intelligencemulti-agent systemsgame theoryreinforcement learningoptimization
Y
Yike Guo
HKUST
Jun Song
Jun Song
Shenzhen University
nanophotonics