🤖 AI Summary
This work addresses the highly complex multi-agent social game *So Long Sucker* (SLS), characterized by dynamic coalition formation, strategic deception, and player elimination—posing significant challenges for multi-agent reinforcement learning (MARL).
Method: We introduce the first open-source, reproducible SLS environment—including a graphical user interface and deep reinforcement learning (DRL) benchmarking toolkit—and systematically evaluate DQN, DDQN, and Dueling DQN. To improve action feasibility, we propose rule-compliance constraints and reward shaping.
Contribution/Results: Our approach achieves >95% preference for legal actions and enables agents to converge to ~50% of the theoretical maximum reward; however, convergence requires ~2,000 episodes and occasional illegal actions persist. This establishes the first DRL baseline for SLS, revealing fundamental training efficiency bottlenecks and demonstrating both the feasibility and structural limitations of classical MARL methods in dynamic coalition games.
📝 Abstract
This paper examines the use of classical deep reinforcement learning (DRL) algorithms, DQN, DDQN, and Dueling DQN, in the strategy game So Long Sucker (SLS), a diplomacy-driven game defined by coalition-building and strategic betrayal. SLS poses unique challenges due to its blend of cooperative and adversarial dynamics, making it an ideal platform for studying multi-agent learning and game theory. The study's primary goal is to teach autonomous agents the game's rules and strategies using classical DRL methods. To support this effort, the authors developed a novel, publicly available implementation of SLS, featuring a graphical user interface (GUI) and benchmarking tools for DRL algorithms. Experimental results reveal that while considered basic by modern DRL standards, DQN, DDQN, and Dueling DQN agents achieved roughly 50% of the maximum possible game reward. This suggests a baseline understanding of the game's mechanics, with agents favoring legal moves over illegal ones. However, a significant limitation was the extensive training required, around 2000 games, for agents to reach peak performance, compared to human players who grasp the game within a few rounds. Even after prolonged training, agents occasionally made illegal moves, highlighting both the potential and limitations of these classical DRL methods in semi-complex, socially driven games. The findings establish a foundational benchmark for training agents in SLS and similar negotiation-based environments while underscoring the need for advanced or hybrid DRL approaches to improve learning efficiency and adaptability. Future research could incorporate game-theoretic strategies to enhance agent decision-making in dynamic multi-agent contexts.