π€ AI Summary
Existing research lacks systematic investigation into how traditional software process models govern collaboration among LLM-based multi-agent systems (MAS) for automated software development.
Method: This study pioneers the use of waterfall, V-model, and agile paradigms as coordination frameworks for LLM-driven MAS. Three process-specific MAS architectures were implemented using GPT-series models and evaluated under uniform experimental conditions using standardized metrics for code quality, generation overhead, and productivity.
Contribution/Results: Empirical results reveal critical trade-offs: the waterfall model achieves highest efficiency but lowest adaptability; the V-model incurs redundant code generation; and the agile model delivers superior correctness and maintainability at substantially higher computational cost. This work establishes the first systematic empirical foundation and coordination design paradigm for generative-AIβenabled software engineering, elucidating how process models mediate performance, scalability, and maintainability in LLM-MAS automation.
π Abstract
[Background] Large Language Model (LLM)-based multi-agent systems (MAS) are transforming software development by enabling autonomous collaboration. Classical software processes such asWaterfall, V-Model, and Agile offer structured coordination patterns that can be repurposed to guide these agent interactions. [Aims] This study explores how traditional software development processes can be adapted as coordination scaffolds for LLM based MAS and examines their impact on code quality, cost, and productivity. [Method] We executed 11 diverse software projects under three process models and four GPT variants, totaling 132 runs. Each output was evaluated using standardized metrics for size (files, LOC), cost (execution time, token usage), and quality (code smells, AI- and human detected bugs). [Results] Both process model and LLM choice significantly affected system performance. Waterfall was most efficient, V-Model produced the most verbose code, and Agile achieved the highest code quality, albeit at higher computational cost. [Conclusions] Classical software processes can be effectively instantiated in LLM-based MAS, but each entails trade-offs across quality, cost, and adaptability. Process selection should reflect project goals, whether prioritizing efficiency, robustness, or structured validation.