Self-Abstraction from Grounded Experience for Plan-Guided Policy Refinement

๐Ÿ“… 2025-11-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing LLM-based agents exhibit limited performance on multi-step reasoning and code-modification tasks in software engineering, primarily due to static execution frameworks that lack the capacity for self-abstraction and policy optimization from experience. This work proposes SAGE, the first framework to automatically induce high-level plan abstractions from real execution traces and close the loop by feeding them back into policy learningโ€”thereby breaking the constraints of static paradigms. SAGE integrates execution trace analysis, abstraction extraction, contextual feedback, and multi-agent coordination to enable continuous policy evolution. Evaluated on the SWE-Bench Verified benchmark, SAGE achieves Pass@1 scores of 73.2%โ€“74.0%, outperforming prior baselines by 7.2 percentage points. The framework significantly improves planning robustness and code generation accuracy on complex, long-horizon software engineering tasks.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language model (LLM) based agents are increasingly used to tackle software engineering tasks that require multi-step reasoning and code modification, demonstrating promising yet limited performance. However, most existing LLM agents typically operate within static execution frameworks, lacking a principled mechanism to learn and self-improve from their own experience and past rollouts. As a result, their performance remains bounded by the initial framework design and the underlying LLM's capabilities. We propose Self-Abstraction from Grounded Experience (SAGE), a framework that enables agents to learn from their own task executions and refine their behavior through self-abstraction. After an initial rollout, the agent induces a concise plan abstraction from its grounded experience, distilling key steps, dependencies, and constraints. This learned abstraction is then fed back as contextual guidance, refining the agent's policy and supporting more structured, informed subsequent executions. Empirically, SAGE delivers consistent performance gains across diverse LLM backbones and agent architectures. Notably, it yields a 7.2% relative performance improvement over the strong Mini-SWE-Agent baseline when paired with the GPT-5 (high) backbone. SAGE further achieves strong overall performance on SWE-Bench Verified benchmark, reaching 73.2% and 74% Pass@1 resolve rates with the Mini-SWE-Agent and OpenHands CodeAct agent framework, respectively.
Problem

Research questions and friction points this paper is trying to address.

LLM agents lack learning from experience for self-improvement
Static execution frameworks limit agent performance and adaptability
Need principled mechanism to refine policies through grounded experience
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agents learn from their own task executions
They induce plan abstractions from grounded experiences
Abstractions refine policies for improved subsequent executions
๐Ÿ”Ž Similar Papers
No similar papers found.