MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
This work addresses the limitations of existing autonomous agent systems, which struggle to learn from user interactions post-deployment and cannot resolve structural failures through adjustments to prompts or other text-based configurations alone. The authors propose a source-code-level self-evolution framework that extends agents’ self-repair capabilities to the program code itself. By automatically collecting and clustering failure cases from production environments, the system triggers a multi-stage, deterministic evolution pipeline in which external coding agents generate code modifications. These changes undergo temporary trial execution for validation and are then safely deployed via container hot-swapping, backed by health-probe-driven rollback mechanisms. This approach transcends the constraints of textual artifacts, enabling Turing-complete, context-drift-resistant structural self-repair. On the OpenClaw benchmark, the method improves average task scores from 0.25 to 0.61 within a single evolution cycle, entirely without human intervention.
📝 Abstract
Autonomous agentic systems are largely static after deployment: they do not learn from user interactions, and recurring failures persist until the next human-driven update ships a fix. Self-evolving agents have emerged in response, but all confine evolution to text-mutable artifacts -- skill files, prompt configurations, memory schemas, workflow graphs -- and leave the agent harness untouched. Since routing, hook ordering, state invariants, and dispatch live in code rather than in any text artifact, an entire class of structural failure is physically unreachable from the text layer. We argue that source-level adaptation is a fundamentally more general medium: it is Turing-complete, a strict superset of every text-mutable scope, takes effect deterministically rather than through base-model compliance, and does not erode under long-context drift. We present MOSS, a system that performs self-rewriting at the source level on production agentic substrates. Each evolution is anchored to an automatically curated batch of production-failure evidence and proceeds through a deterministic multi-stage pipeline; code modification is delegated to a pluggable external coding-agent CLI while MOSS retains stage ordering and verdicts. Candidates are verified by replaying the batch against the candidate image in ephemeral trial workers, then promoted via user-consent-gated, in-place container swap with health-probe-gated rollback. On OpenClaw, MOSS lifts a four-task mean grader score from 0.25 to 0.61 in a single cycle without human intervention.
Problem

Research questions and friction points this paper is trying to address.

autonomous agent systems
self-evolution
source-level rewriting
structural failure
production-failure evidence
Innovation

Methods, ideas, or system contributions that make the work stand out.

source-level rewriting
self-evolving agents
autonomous agent systems
deterministic evolution
production-failure adaptation
Q
Qianshu Cai
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
Y
Yonggang Zhang
The Hong Kong University of Science and Technology
X
Xianzhang Jia
The Hong Kong University of Science and Technology
Wei Xue
Wei Xue
Department of Applied Plant Science, Chonnam National University
Crop ecophysiology modellingclimate change
Jun Song
Jun Song
Shenzhen University
nanophotonics
Xinmei Tian
Xinmei Tian
University of Science and Technology of China
Multimedia Information Retrieval
Y
Yike Guo
The Hong Kong University of Science and Technology