Evaluating Classical Software Process Models as Coordination Mechanisms for LLM-Based Software Generation

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing research lacks systematic investigation into how traditional software process models govern collaboration among LLM-based multi-agent systems (MAS) for automated software development. Method: This study pioneers the use of waterfall, V-model, and agile paradigms as coordination frameworks for LLM-driven MAS. Three process-specific MAS architectures were implemented using GPT-series models and evaluated under uniform experimental conditions using standardized metrics for code quality, generation overhead, and productivity. Contribution/Results: Empirical results reveal critical trade-offs: the waterfall model achieves highest efficiency but lowest adaptability; the V-model incurs redundant code generation; and the agile model delivers superior correctness and maintainability at substantially higher computational cost. This work establishes the first systematic empirical foundation and coordination design paradigm for generative-AI–enabled software engineering, elucidating how process models mediate performance, scalability, and maintainability in LLM-MAS automation.

Technology Category

Application Category

📝 Abstract

[Background] Large Language Model (LLM)-based multi-agent systems (MAS) are transforming software development by enabling autonomous collaboration. Classical software processes such asWaterfall, V-Model, and Agile offer structured coordination patterns that can be repurposed to guide these agent interactions. [Aims] This study explores how traditional software development processes can be adapted as coordination scaffolds for LLM based MAS and examines their impact on code quality, cost, and productivity. [Method] We executed 11 diverse software projects under three process models and four GPT variants, totaling 132 runs. Each output was evaluated using standardized metrics for size (files, LOC), cost (execution time, token usage), and quality (code smells, AI- and human detected bugs). [Results] Both process model and LLM choice significantly affected system performance. Waterfall was most efficient, V-Model produced the most verbose code, and Agile achieved the highest code quality, albeit at higher computational cost. [Conclusions] Classical software processes can be effectively instantiated in LLM-based MAS, but each entails trade-offs across quality, cost, and adaptability. Process selection should reflect project goals, whether prioritizing efficiency, robustness, or structured validation.

Problem

Research questions and friction points this paper is trying to address.

Adapting classical software processes for LLM-based multi-agent coordination

Evaluating impact on code quality, cost, and productivity metrics

Comparing Waterfall, V-Model, and Agile performance trade-offs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapting classical software process models

Evaluating coordination mechanisms for LLM agents

Comparing Waterfall V-Model Agile performance

🔎 Similar Papers

From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future