Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

236K/year
🤖 AI Summary
Existing software engineering LLM agents rely on manual design and costly offline fine-tuning, resulting in poor generalization and limited capacity for continuous adaptation to novel tasks. This paper introduces the first runtime-autonomous, self-evolving software agent framework that requires no offline fine-tuning. Instead, it dynamically optimizes its capabilities *in situ* during real-world software task execution—via real-time self-modifying code generation, adaptive architectural reconfiguration, and coordinated invocation of bash tools. Its core innovation lies in embedding evolutionary mechanisms directly into the inference loop, enabling architecture-level self-evolution. Built upon mini-SWE-agent, our prototype achieves a 75.4% task resolution rate on SWE-bench Verified—the highest among open-source models—and sets a new state-of-the-art of 45.8% on the more challenging SWE-Bench Pro, demonstrating substantial improvements in cross-task adaptability and generalization.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are reshaping almost all industries, including software engineering. In recent years, a number of LLM agents have been proposed to solve real-world software problems. Such software agents are typically equipped with a suite of coding tools and can autonomously decide the next actions to form complete trajectories to solve end-to-end software tasks. While promising, they typically require dedicated design and may still be suboptimal, since it can be extremely challenging and costly to exhaust the entire agent scaffold design space. Recognizing that software agents are inherently software themselves that can be further refined/modified, researchers have proposed a number of self-improving software agents recently, including the Darwin-Gödel Machine (DGM). Meanwhile, such self-improving agents require costly offline training on specific benchmarks and may not generalize well across different LLMs or benchmarks. In this paper, we propose Live-SWE-agent, the first live software agent that can autonomously and continuously evolve itself on-the-fly during runtime when solving real-world software problems. More specifically, Live-SWE-agent starts with the most basic agent scaffold with only access to bash tools (e.g., mini-SWE-agent), and autonomously evolves its own scaffold implementation while solving real-world software problems. Our evaluation on the widely studied SWE-bench Verified benchmark shows that Live-SWE-agent can achieve an impressive solve rate of 75.4% without test-time scaling, outperforming all existing open-source software agents and approaching the performance of the best proprietary solution. Moreover, Live-SWE-agent outperforms state-of-the-art manually crafted software agents on the recent SWE-Bench Pro benchmark, achieving the best-known solve rate of 45.8%.
Problem

Research questions and friction points this paper is trying to address.

Developing software agents that autonomously evolve during runtime
Overcoming limitations of manually designed static agent scaffolds
Eliminating costly offline training requirements for agent improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autonomously evolves agent scaffold during runtime
Starts with basic bash tools and self-improves
Achieves high solve rates without test-time scaling