HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

📅 2024-09-09
🏛️ arXiv.org
📈 Citations: 20
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-based software engineering (SE) agents are predominantly single-task and single-language, lacking generality and end-to-end collaborative capability. To address this, we propose the first general-purpose multi-agent system for full-stack SE tasks, introducing a novel four-role collaborative architecture—*Planning*, *Navigation*, *Coding*, and *Execution*—that jointly supports cross-language, cross-repository GitHub Issue resolution, fault localization, and repair. The system integrates hierarchical task decomposition, semantic code navigation, incremental editing, and sandboxed execution. Evaluated on three major benchmarks—SWE-Bench, RepoExec, and Defects4J—it consistently outperforms prior approaches: achieving significantly higher GitHub issue resolution rates and establishing new state-of-the-art performance in both fault localization and repair accuracy.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have revolutionized software engineering (SE), showcasing remarkable proficiency in various coding tasks. Despite recent advancements that have enabled the creation of autonomous software agents utilizing LLMs for end-to-end development tasks, these systems are typically designed for specific SE functions. We introduce HyperAgent, an innovative generalist multi-agent system designed to tackle a wide range of SE tasks across different programming languages by mimicking the workflows of human developers. HyperAgent features four specialized agents-Planner, Navigator, Code Editor, and Executor-capable of handling the entire lifecycle of SE tasks, from initial planning to final verification. HyperAgent sets new benchmarks in diverse SE tasks, including GitHub issue resolution on the renowned SWE-Bench benchmark, outperforming robust baselines. Furthermore, HyperAgent demonstrates exceptional performance in repository-level code generation (RepoExec) and fault localization and program repair (Defects4J), often surpassing state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Generalist multi-agent system for diverse software engineering tasks
Handles entire lifecycle from planning to verification
Outperforms baselines in GitHub issue resolution and code generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalist multi-agent system for diverse SE tasks
Four specialized agents handle full development lifecycle
Outperforms benchmarks in issue resolution and code generation
🔎 Similar Papers
No similar papers found.
H
H. N. Phan
FPT Software AI Center, Viet Nam
P
Phong X. Nguyen
FPT Software AI Center, Viet Nam
Nghi D. Q. Bui
Nghi D. Q. Bui
Unknown affiliation
AI4CodeSoftware EngineeringCode AgentAI4SERL