Self-playing Adversarial Language Game Enhances LLM Reasoning

📅 2024-04-16

🏛️ arXiv.org

📈 Citations: 11

✨ Influential: 1

career value

199K/year

🤖 AI Summary

To address the limitations of large language models (LLMs) in deep semantic understanding and counterfactual reasoning for complex inference tasks, this paper proposes Adversarial Taboo—a two-player adversarial language game that establishes an implicit semantic博弈 under information constraints. Methodologically, we introduce a novel self-play adversarial training paradigm: an LLM autonomously assumes the roles of attacker and defender in multi-turn dialogues where the target word is withheld; policy optimization proceeds solely via reinforcement learning guided by win/loss signals, eliminating reliance on human annotations or external supervision. Experiments demonstrate substantial performance gains across diverse reasoning benchmarks, and iterative self-play enables continual capability refinement. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

We explore the potential of self-play training for large language models (LLMs) in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate around a target word only visible to the attacker. The attacker aims to induce the defender to speak the target word unconsciously, while the defender tries to infer the target word from the attacker's utterances. To win the game, both players must have sufficient knowledge about the target word and high-level reasoning ability to infer and express in this information-reserved conversation. Hence, we are curious about whether LLMs' reasoning ability can be further enhanced by Self-Playing this Adversarial language Game (SPAG). With this goal, we select several open-source LLMs and let each act as the attacker and play with a copy of itself as the defender on an extensive range of target words. Through reinforcement learning on the game outcomes, we observe that the LLMs' performances uniformly improve on a broad range of reasoning benchmarks. Furthermore, iteratively adopting this self-play process can continuously promote LLMs' reasoning abilities. The code is available at https://github.com/Linear95/SPAG.

Problem

Research questions and friction points this paper is trying to address.

Self-play Training

Large Language Models (LLMs)

Adversarial Language Games

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-play Training

Adversarial Taboo

Reinforcement Learning

🔎 Similar Papers

No similar papers found.