A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-driven software engineering agent systems lack systematic alignment between benchmarks and methodologies, hindering rigorous evaluation and progress. Method: We conduct a systematic literature review of over 150 papers, constructing a unified analytical framework encompassing 50+ benchmarks; we propose a three-dimensional taxonomy grounded in prompting, fine-tuning, and agent architecture, and distill a generic agent workflow comprising planning, reasoning, memory, and tool augmentation. We further establish, for the first time, a comprehensive mapping between task complexity and solution strategies, enabling synergistic integration of evaluation benchmarks and technical pathways. Contribution/Results: Our work delivers a field-wide panoramic view, clarifies the applicability boundaries of major paradigms, provides actionable guidance for method selection, and identifies key frontiers—including multi-agent collaboration, self-evolving systems, and formal verification integration—as critical directions for breakthrough advancement.

Technology Category

Application Category

📝 Abstract
The integration of LLMs into software engineering has catalyzed a paradigm shift from traditional rule-based systems to sophisticated agentic systems capable of autonomous problem-solving. Despite this transformation, the field lacks a comprehensive understanding of how benchmarks and solutions interconnect, hindering systematic progress and evaluation. This survey presents the first holistic analysis of LLM-empowered software engineering, bridging the critical gap between evaluation and solution approaches. We analyze 150+ recent papers and organize them into a comprehensive taxonomy spanning two major dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, covering code generation, translation, repair, and other tasks. Our analysis reveals how the field has evolved from simple prompt engineering to complex agentic systems incorporating planning and decomposition, reasoning and self-refinement, memory mechanisms, and tool augmentation. We present a unified pipeline that illustrates the complete workflow from task specification to final deliverables, demonstrating how different solution paradigms address varying complexity levels across software engineering tasks. Unlike existing surveys that focus on isolated aspects, we provide full-spectrum coverage connecting 50+ benchmarks with their corresponding solution strategies, enabling researchers to identify optimal approaches for specific evaluation criteria. Furthermore, we identify critical research gaps and propose actionable future directions, including multi-agent collaboration frameworks, self-evolving code generation systems, and integration of formal verification with LLM-based methods. This survey serves as a foundational resource for researchers and practitioners seeking to understand, evaluate, and advance LLM-empowered software engineering systems.
Problem

Research questions and friction points this paper is trying to address.

Bridging the gap between benchmarks and solutions in LLM-empowered software engineering
Analyzing the evolution from simple prompts to complex agentic systems
Providing full-spectrum coverage connecting benchmarks with solution strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Survey analyzes prompt-based and agent-based solutions
Proposes unified pipeline for software engineering workflows
Connects benchmarks with corresponding solution strategies
🔎 Similar Papers
No similar papers found.
J
Jiale Guo
Digital Trust Centre, Nanyang Technological University, Singapore
Suizhi Huang
Suizhi Huang
Nanyang Technological University
Computer VisionFederated LearningMulti-task Learning
M
Mei Li
College of Computing and Data Science, Nanyang Technological University, Singapore
D
Dong Huang
School of Computing and Data Science, The University of Hong Kong, Hong Kong
X
Xingsheng Chen
School of Computing and Data Science, The University of Hong Kong, Hong Kong
R
Regina Zhang
School of Computer Science and Technology, The University of Cambridge, UK
Zhijiang Guo
Zhijiang Guo
HKUST (GZ) | HKUST
Natural Language ProcessingMachine LearningLarge Language Models
H
Han Yu
College of Computing and Data Science, Nanyang Technological University, Singapore
Siu-Ming Yiu
Siu-Ming Yiu
Professor of Computer Science, The University of Hong Kong
CybersecurityCryptographyFinTechBioinformatics
C
Christian Jensen
School of Computer Science, Alborg University, Denmark
P
Pietro Lio
School of Computer Science and Technology, The University of Cambridge, UK
Kwok-Yan Lam
Kwok-Yan Lam
Nanyang Technological University
CybersecurityPrivacy-Preserving technologiesDigital TrustDistributing systemsLegalTech