🤖 AI Summary
Large language model (LLM)-driven non-player characters (NPCs) in social deduction games (e.g., Werewolf) suffer from inflexible intelligence control, rigid difficulty scaling, and insufficient safety and controllability.
Method: We propose a win-rate-controllable LLM agent framework featuring a novel tripartite architecture—Predictor, Decider, and Discussor—integrated with role-aware multi-stage prompting and win-rate-constrained Proximal Policy Optimization (PPO) reinforcement learning. Crucially, target win rate is explicitly embedded as a constraint in the decision-chain reward function.
Contribution/Results: This enables fine-grained, continuous adaptation of agent capability—from novice to expert level—while ensuring fairness, safety, and interpretability. Experiments on Werewolf demonstrate significant improvements over baselines, with win-rate control error ≤ ±2%, validating precise, reliable, and controllable agent behavior.
📝 Abstract
Large Language Models (LLMs) have advanced the capability of game agents in social deduction games (SDGs). These games rely heavily on conversation-driven interactions and require agents to infer, make decisions, and express based on such information. While this progress leads to more sophisticated and strategic non-player characters (NPCs) in SDGs, there exists a need to control the proficiency of these agents. This control not only ensures that NPCs can adapt to varying difficulty levels during gameplay, but also provides insights into the safety and fairness of LLM agents. In this paper, we present DVM, a novel framework for developing controllable LLM agents for SDGs, and demonstrate its implementation on one of the most popular SDGs, Werewolf. DVM comprises three main components: Predictor, Decider, and Discussor. By integrating reinforcement learning with a win rate-constrained decision chain reward mechanism, we enable agents to dynamically adjust their gameplay proficiency to achieve specified win rates. Experiments show that DVM not only outperforms existing methods in the Werewolf game, but also successfully modulates its performance levels to meet predefined win rate targets. These results pave the way for LLM agents' adaptive and balanced gameplay in SDGs, opening new avenues for research in controllable game agents.