HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

📅 2024-09-09

🏛️ arXiv.org

📈 Citations: 20

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Current LLM-based software engineering (SE) agents are predominantly single-task and single-language, lacking generality and end-to-end collaborative capability. To address this, we propose the first general-purpose multi-agent system for full-stack SE tasks, introducing a novel four-role collaborative architecture—*Planning*, *Navigation*, *Coding*, and *Execution*—that jointly supports cross-language, cross-repository GitHub Issue resolution, fault localization, and repair. The system integrates hierarchical task decomposition, semantic code navigation, incremental editing, and sandboxed execution. Evaluated on three major benchmarks—SWE-Bench, RepoExec, and Defects4J—it consistently outperforms prior approaches: achieving significantly higher GitHub issue resolution rates and establishing new state-of-the-art performance in both fault localization and repair accuracy.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have revolutionized software engineering (SE), showcasing remarkable proficiency in various coding tasks. Despite recent advancements that have enabled the creation of autonomous software agents utilizing LLMs for end-to-end development tasks, these systems are typically designed for specific SE functions. We introduce HyperAgent, an innovative generalist multi-agent system designed to tackle a wide range of SE tasks across different programming languages by mimicking the workflows of human developers. HyperAgent features four specialized agents-Planner, Navigator, Code Editor, and Executor-capable of handling the entire lifecycle of SE tasks, from initial planning to final verification. HyperAgent sets new benchmarks in diverse SE tasks, including GitHub issue resolution on the renowned SWE-Bench benchmark, outperforming robust baselines. Furthermore, HyperAgent demonstrates exceptional performance in repository-level code generation (RepoExec) and fault localization and program repair (Defects4J), often surpassing state-of-the-art baselines.

Problem

Research questions and friction points this paper is trying to address.

Generalist multi-agent system for diverse software engineering tasks

Handles entire lifecycle from planning to verification

Outperforms baselines in GitHub issue resolution and code generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalist multi-agent system for diverse SE tasks

Four specialized agents handle full development lifecycle

Outperforms benchmarks in issue resolution and code generation

🔎 Similar Papers

System for systematic literature review using multiple AI agents: Concept and an empirical evaluation