DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

📅 2025-06-11
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the exponential growth of the joint action space, complex strategic interdependencies, and high computational overhead in multi-agent diplomatic games, this paper proposes a lightweight large language model (LLM) fine-tuning framework. Methodologically, it introduces (i) autonomous regressive factorization—a novel modeling paradigm that decomposes the joint action space into sequential, unit-level decisions—and (ii) an integrated approach combining equilibrium strategy learning with few-shot behavioral modeling of game interactions. Remarkably, the framework achieves superior performance to Cicero using only 1.5% of its training data. Evaluated on the Diplomacy benchmark, it significantly improves strategic consistency and win rate. These results empirically validate the efficacy and scalability of data-efficient LLMs for solving high-order multi-agent coordination problems under sparse supervision.

Technology Category

Application Category

📝 Abstract
Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplomacy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to simplify the complex task of multi-unit action assignment into a sequence of unit-level decisions. By defining an equilibrium policy within this framework as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its performance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.
Problem

Research questions and friction points this paper is trying to address.

AI struggles with Diplomacy's cooperative-competitive complexity
Traditional methods need excessive computational resources
LLMs face action-combination explosion in Diplomacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned LLM for Diplomacy decision-making
Autoregressive factorization for multi-unit actions
Equilibrium policy learning with minimal data
🔎 Similar Papers
No similar papers found.
K
Kaixuan Xu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Jiajun Chai
Jiajun Chai
Meituan Inc.
Reinforcement LearningLLMsAgentic Learning
S
Sicheng Li
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Y
Yuqian Fu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yuanheng Zhu
Yuanheng Zhu
Institute of Automation, Chinese Academy of Sciences
Dongbin Zhao
Dongbin Zhao
Institute of Automation, Chinese Academy of Sciences
Deep Reinforcement LearningAdaptive Dynamic ProgrammingGame AISmart drivingrobotics