Understanding Conversational Patterns in Multi-agent Programming: A Case Study on Fibonacci Game Development

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This study addresses the challenges of multi-agent collaboration in software engineering—such as role misalignment, unstable convergence, and error propagation—that hinder reliable code generation. It presents the first systematic evaluation of dual-role agent collaboration (designer and programmer) along three dimensions: efficiency, consistency, and effectiveness. The authors construct 12 dialogue systems by pairing seven open-source large language models (Gemma 2/3, LLaMA 3.2/3.3, DeepSeek-R1, MiniCPM, and Qwen3) and conduct a multidimensional analysis using BLEU, ROUGE, and compilation success rates. Results show that DeepSeek-R1 self-pairing converges to the correct solution stably from the first round, while LLaMA 3.2 and Qwen3 self-pairings exhibit strong role alignment but deviate from correctness; all other pairings fail to converge effectively. This work provides a quantifiable framework and empirical insights for evaluating multi-agent programming collaboration.

📝 Abstract

Large Language Models (LLMs) are increasingly applied to software engineering (SE), yet their potential for autonomous, role-oriented collaboration remains largely underexplored. Understanding how multiple LLM-based agents coordinate, maintain role alignment, and converge on solutions is critical for SE, as naively allowing agents to interact does not reliably lead to correct or stable outcomes. Recent empirical studies show that unstructured or poorly understood interaction dynamics can result in error propagation, premature consensus on incorrect solutions, or prolonged disagreement that prevents convergence, even when correct partial solutions are present early in the interaction. As an initial step towards addressing this underexplored area, we undertake a systematic analysis of conversations between two agents, a Designer and a Programmer across 12 model combinations from 7 open-source LLMs (Gemma 2, Gemma 3, LLaMA 3.2, LLaMA 3.3, DeepSeek-R1, MiniCPM, and Qwen3). Our systematic approach reveals three key dimensions of multi-agent interaction: efficiency (the speed and stability of convergence), consistency (the degree of role alignment visualized by BLEU and ROUGE), and effectiveness (the extent of compilation success and error resolution). Results show that the DeepSeek-R1:DeepSeek-R1 pair was unique in converging to the correct solution from the very first iteration and sustaining it consistently to the final iteration, while LLaMA 3.2:LLaMA 3.2 and Qwen3:Qwen3 demonstrated strong Designer:Programmer role alignment despite of diverging from the correct solution. The other pairs deviated from the task, never to converge to a result. These findings advance understanding of agentic programming and highlight the need for further research on understanding and calibrating convergence and stop conditions essential for future autonomous SE.

Problem

Research questions and friction points this paper is trying to address.

multi-agent programming

conversational patterns

role alignment

convergence

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent programming

conversational patterns

role alignment