🤖 AI Summary
Current LLM-based software engineering (SE) agents are predominantly single-task and single-language, lacking generality and end-to-end collaborative capability. To address this, we propose the first general-purpose multi-agent system for full-stack SE tasks, introducing a novel four-role collaborative architecture—*Planning*, *Navigation*, *Coding*, and *Execution*—that jointly supports cross-language, cross-repository GitHub Issue resolution, fault localization, and repair. The system integrates hierarchical task decomposition, semantic code navigation, incremental editing, and sandboxed execution. Evaluated on three major benchmarks—SWE-Bench, RepoExec, and Defects4J—it consistently outperforms prior approaches: achieving significantly higher GitHub issue resolution rates and establishing new state-of-the-art performance in both fault localization and repair accuracy.
📝 Abstract
Large Language Models (LLMs) have revolutionized software engineering (SE), showcasing remarkable proficiency in various coding tasks. Despite recent advancements that have enabled the creation of autonomous software agents utilizing LLMs for end-to-end development tasks, these systems are typically designed for specific SE functions. We introduce HyperAgent, an innovative generalist multi-agent system designed to tackle a wide range of SE tasks across different programming languages by mimicking the workflows of human developers. HyperAgent features four specialized agents-Planner, Navigator, Code Editor, and Executor-capable of handling the entire lifecycle of SE tasks, from initial planning to final verification. HyperAgent sets new benchmarks in diverse SE tasks, including GitHub issue resolution on the renowned SWE-Bench benchmark, outperforming robust baselines. Furthermore, HyperAgent demonstrates exceptional performance in repository-level code generation (RepoExec) and fault localization and program repair (Defects4J), often surpassing state-of-the-art baselines.