🤖 AI Summary
To address the failure of static prompting in large language model (LLM) agents operating within dynamic, massive-context environments—leading to ineffective error correction and capability enhancement—this paper proposes SCOPE, an execution-trajectory-driven online prompt evolution framework. Methodologically, SCOPE integrates online optimization, execution trace analysis, neuralized prompt tuning, and multi-perspective strategy sampling. Its key contributions are: (1) a novel dual-stream prompt evolution mechanism that jointly optimizes tactical-level real-time error correction and strategic-level cross-task generalization; and (2) a perspective-driven exploration strategy enabling autonomous, human-intervention-free context management. Evaluated on the HLE benchmark, SCOPE significantly improves task success rate from 14.23% to 38.64%. The implementation is publicly available.
📝 Abstract
Large Language Model (LLM) agents are increasingly deployed in environments that generate massive, dynamic contexts. However, a critical bottleneck remains: while agents have access to this context, their static prompts lack the mechanisms to manage it effectively, leading to recurring Corrective and Enhancement failures. To address this capability gap, we introduce extbf{SCOPE} (Self-evolving Context Optimization via Prompt Evolution). SCOPE frames context management as an extit{online optimization} problem, synthesizing guidelines from execution traces to automatically evolve the agent's prompt. We propose a Dual-Stream mechanism that balances tactical specificity (resolving immediate errors) with strategic generality (evolving long-term principles). Furthermore, we introduce Perspective-Driven Exploration to maximize strategy coverage, increasing the likelihood that the agent has the correct strategy for any given task. Experiments on the HLE benchmark show that SCOPE improves task success rates from 14.23% to 38.64% without human intervention. We make our code publicly available at https://github.com/JarvisPei/SCOPE.