🤖 AI Summary
To address the exponential growth of the joint action space, complex strategic interdependencies, and high computational overhead in multi-agent diplomatic games, this paper proposes a lightweight large language model (LLM) fine-tuning framework. Methodologically, it introduces (i) autonomous regressive factorization—a novel modeling paradigm that decomposes the joint action space into sequential, unit-level decisions—and (ii) an integrated approach combining equilibrium strategy learning with few-shot behavioral modeling of game interactions. Remarkably, the framework achieves superior performance to Cicero using only 1.5% of its training data. Evaluated on the Diplomacy benchmark, it significantly improves strategic consistency and win rate. These results empirically validate the efficacy and scalability of data-efficient LLMs for solving high-order multi-agent coordination problems under sparse supervision.
📝 Abstract
Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplomacy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to simplify the complex task of multi-unit action assignment into a sequence of unit-level decisions. By defining an equilibrium policy within this framework as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its performance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.