Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game

📅 2025-01-24

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Modeling social reasoning for artificial general intelligence (AGI) remains challenging, particularly in strategic text-based games (e.g., *Werewolf*) requiring coherent natural language interaction and rational decision-making under uncertainty. Method: We propose MaKTO, the first end-to-end multi-agent framework grounded in Wittgenstein’s language-game theory—departing from conventional two-stage “decision-then-generation” paradigms. MaKTO integrates adversarial multi-agent gameplay, unpaired preference data construction, Kahneman–Tversky optimization (KTO), and in-context interactive distillation. Contribution/Results: Evaluated on 9-player *Werewolf*, MaKTO achieves a 61% average win rate—outperforming GPT-4o (+23.0%) and state-of-the-art two-stage RL baselines (+10.9%). Against human experts, it attains a 60% win rate; in Turing-style blind tests, human evaluators misclassify its outputs as human 49% of the time—demonstrating unprecedented alignment between strategic competence and human-like linguistic expressivity.

Technology Category

Application Category

📝 Abstract

Achieving Artificial General Intelligence (AGI) requires AI agents that can not only make stratigic decisions but also engage in flexible and meaningful communication. Inspired by Wittgenstein's language game theory in Philosophical Investigations, we propose that language agents can learn through in-context interaction rather than traditional multi-stage frameworks that separate decision-making from language expression. Using Werewolf, a social deduction game that tests language understanding, strategic interaction, and adaptability, we develop the Multi-agent Kahneman&Tversky's Optimization (MaKTO). MaKTO engages diverse models in extensive gameplay to generate unpaired desirable and unacceptable responses, then employs KTO to refine the model's decision-making process. In 9-player Werewolf games, MaKTO achieves a 61% average win rate across various models, outperforming GPT-4o and two-stage RL agents by relative improvements of 23.0% and 10.9%, respectively. Notably, MaKTO also demonstrates human-like performance, winning 60% against expert players and showing only 49% detectability in Turing-style blind tests. These results showcase MaKTO's superior decision-making, strategic adaptation, and natural language generation in complex social deduction games.

Problem

Research questions and friction points this paper is trying to address.

Language Models

Strategic Interaction

Text-Based Games

Innovation

Methods, ideas, or system contributions that make the work stand out.

MaKTO

Strategic Interaction Learning

Natural Language Processing

🔎 Similar Papers

Self-playing Adversarial Language Game Enhances LLM Reasoning