Enhancing Language Agent Strategic Reasoning through Self-Play in Adversarial Games

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing language agents exhibit insufficient strategic reasoning in dynamic adversarial games, and there is a lack of systematic investigation into opponent selection mechanisms in self-play. Method: We propose Step-level poliCy Optimization (SCO-PAL), a multi-stage reinforcement learning framework grounded in self-play, featuring a dynamic feedback mechanism for fine-grained policy refinement. Contribution/Results: SCO-PAL introduces the first causal analysis of how opponent difficulty influences the evolution of strategic reasoning capabilities, and designs an adaptive opponent-matching strategy. Evaluated across six adversarial games, SCO-PAL achieves an average win rate improvement of ~30% over baselines and attains a 54.76% win rate against GPT-4—significantly outperforming conventional methods reliant on expert annotations or static feedback. This work establishes a scalable paradigm for strategic autonomous learning in language agents.

Technology Category

Application Category

📝 Abstract

Existing language agents often encounter difficulties in dynamic adversarial games due to poor strategic reasoning. To mitigate this limitation, a promising approach is to allow agents to learn from game interactions automatically, without relying on costly expert-labeled data. Unlike static environments where agents receive fixed feedback or rewards, selecting appropriate opponents in dynamic adversarial games can significantly impact learning performance. However, the discussion of opponents in adversarial environments remains an area under exploration. In this paper, we propose a Step-level poliCy Optimization method through Play-And-Learn, SCO-PAL. Leveraging SCO-PAL, we conduct a detailed analysis of opponent selection by setting opponents at different levels and find that self-play is the most effective way to improve strategic reasoning in such adversarial environments. Utilizing SCO-PAL with self-play, we increase the average win rate against four opponents by approximately 30% compared to baselines and achieve a 54.76% win rate against GPT-4 in six adversarial games.

Problem

Research questions and friction points this paper is trying to address.

Improving strategic reasoning in adversarial games

Learning from game interactions without expert data

Optimizing opponent selection to enhance performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-play method improves strategic reasoning in games

Step-level policy optimization without expert-labeled data

Automated opponent selection enhances adversarial learning performance

🔎 Similar Papers

Self-playing Adversarial Language Game Enhances LLM Reasoning