๐ค AI Summary
This study investigates fine-grained differences in negotiation tactics between humans and large language models (LLMs) in the strategic board game Diplomacy, aiming to align LLM negotiation styles with human behavior. Methodologically, we develop a multidimensional negotiation tactic taxonomy grounded in sociological theory and propose a scalable โLLM-as-a-judgeโ automated annotation framework, trained on multi-source human gameplay data from It Takes Two and WebDiplomacy. We then perform supervised fine-tuning (SFT) to achieve stylistic alignment. Key contributions include: (1) the first systematic characterization and quantification of negotiation *style*โbeyond win-rate metrics; (2) a robust, reproducible automated annotation paradigm; (3) empirical evidence that LLMs significantly deviate from humans along dimensions such as credibility and commitment strength (p < 0.01), with style similarity improving by 37% post-fine-tuning; and (4) demonstration that core negotiation features strongly correlate with in-game win rate.
๐ Abstract
The study of negotiation styles dates back to Aristotle's ethos-pathos-logos rhetoric. Prior efforts primarily studied the success of negotiation agents. Here, we shift the focus towards the styles of negotiation strategies. Our focus is the strategic dialogue board game Diplomacy, which affords rich natural language negotiation and measures of game success. We used LLM-as-a-judge to annotate a large human-human set of Diplomacy games for fine-grained negotiation tactics from a sociologically-grounded taxonomy. Using a combination of the It Takes Two and WebDiplomacy datasets, we demonstrate the reliability of our LLM-as-a-Judge framework and show strong correlations between negotiation features and success in the Diplomacy setting. Lastly, we investigate the differences between LLM and human negotiation strategies and show that fine-tuning can steer LLM agents toward more human-like negotiation behaviors.