Towards Engineering Multi-Agent LLMs: A Protocol-Driven Approach

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-driven multi-agent systems (MAS) for collaborative software development suffer from three critical issues: lack of formal specifications, inconsistent coordination, and insufficient verification—stemming fundamentally from the neglect of structured software engineering principles. To address these, we propose SEMAP, a novel protocol layer that systematically integrates behavioral contract modeling, structured message passing, and lifecycle-guided, phase-wise verification into MAS architecture. Implemented atop Google’s A2A framework and evaluated using the MAST failure classification framework, SEMAP demonstrates significant improvements: in code development tasks, function-level and deployment-level total failure rates decrease by 69.6% and 56.7%, respectively; in vulnerability detection, failure rates drop by up to 47.4% (Python) and 28.2% (C/C++). This work establishes a verifiable, constraint-aware, and engineering-friendly protocol design paradigm for LLM-based multi-agent systems.

Technology Category

Application Category

📝 Abstract
The increasing demand for software development has driven interest in automating software engineering (SE) tasks using Large Language Models (LLMs). Recent efforts extend LLMs into multi-agent systems (MAS) that emulate collaborative development workflows, but these systems often fail due to three core deficiencies: under-specification, coordination misalignment, and inappropriate verification, arising from the absence of foundational SE structuring principles. This paper introduces Software Engineering Multi-Agent Protocol (SEMAP), a protocol-layer methodology that instantiates three core SE design principles for multi-agent LLMs: (1) explicit behavioral contract modeling, (2) structured messaging, and (3) lifecycle-guided execution with verification, and is implemented atop Google's Agent-to-Agent (A2A) infrastructure. Empirical evaluation using the Multi-Agent System Failure Taxonomy (MAST) framework demonstrates that SEMAP effectively reduces failures across different SE tasks. In code development, it achieves up to a 69.6% reduction in total failures for function-level development and 56.7% for deployment-level development. For vulnerability detection, SEMAP reduces failure counts by up to 47.4% on Python tasks and 28.2% on C/C++ tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses multi-agent LLM failures from under-specification and misalignment
Introduces protocol-driven methodology with structured messaging and verification
Reduces system failures in software engineering tasks like code development
Innovation

Methods, ideas, or system contributions that make the work stand out.

Protocol-layer methodology for multi-agent LLM systems
Implements explicit contracts and structured messaging
Lifecycle-guided execution with verification reduces failures
🔎 Similar Papers
No similar papers found.
Z
Zhenyu Mao
City University of Hong Kong, Hong Kong, China
J
Jacky Keung
City University of Hong Kong, Hong Kong, China
Fengji Zhang
Fengji Zhang
Department of Computer Science, City University of Hong Kong
Software EngineeringLarge Language Models
S
Shuo Liu
City University of Hong Kong, Hong Kong, China
Y
Yifei Wang
City University of Hong Kong, Hong Kong, China
Jialong Li
Jialong Li
Waseda University
self-adaptive systemsrequirement engineeringhuman-in-the-loop