LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

📅 2023-10-05
📈 Citations: 9
Influential: 1
📄 PDF
🤖 AI Summary
This work evaluates large language models’ (LLMs) capabilities in pure multi-agent coordination—where agents share no explicit communication or centralized control. To this end, the authors introduce LLM-Coordination, the first benchmark specifically designed for pure coordination scenarios, comprising two task families: (1) Agentic Coordination, wherein LLMs act as autonomous agents interacting in four canonical pure coordination games; and (2) Coordination Question Answering (CoordQA), a set of 198 multiple-choice questions assessing environment understanding, theory of mind (ToM) reasoning, and joint planning. Experiments employ zero-shot and few-shot evaluation across models including GPT-4 and Llama2. Results show LLMs excel in environment-signal-driven coordination but exhibit significant limitations in tasks requiring partner belief and intention modeling—highlighting ToM and joint planning as critical bottlenecks. Notably, LLMs demonstrate superior zero-shot coordination robustness across unseen partners compared to traditional reinforcement learning approaches.
📝 Abstract
Large Language Models (LLMs) have demonstrated emergent common-sense reasoning and Theory of Mind (ToM) capabilities, making them promising candidates for developing coordination agents. This study introduces the LLM-Coordination Benchmark, a novel benchmark for analyzing LLMs in the context of Pure Coordination Settings, where agents must cooperate to maximize gains. Our benchmark evaluates LLMs through two distinct tasks. The first is Agentic Coordination, where LLMs act as proactive participants in four pure coordination games. The second is Coordination Question Answering (CoordQA), which tests LLMs on 198 multiple-choice questions across these games to evaluate three key abilities: Environment Comprehension, ToM Reasoning, and Joint Planning. Results from Agentic Coordination experiments reveal that LLM-Agents excel in multi-agent coordination settings where decision-making primarily relies on environmental variables but face challenges in scenarios requiring active consideration of partners' beliefs and intentions. The CoordQA experiments further highlight significant room for improvement in LLMs' Theory of Mind reasoning and joint planning capabilities. Zero-Shot Coordination (ZSC) experiments in the Agentic Coordination setting demonstrate that LLM agents, unlike RL methods, exhibit robustness to unseen partners. These findings indicate the potential of LLMs as Agents in pure coordination setups and underscore areas for improvement. Code Available at https://github.com/eric-ai-lab/llm_coordination.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' multi-agent coordination abilities in cooperative settings
Assessing Theory of Mind reasoning and joint planning in LLMs
Testing robustness of LLM agents with unseen partners in coordination
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-Coordination Benchmark for multi-agent analysis
Agentic Coordination and CoordQA tasks evaluation
Zero-Shot Coordination robustness in unseen partners
🔎 Similar Papers
No similar papers found.