A Generalist Hanabi Agent

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Hanabi multi-agent reinforcement learning (MARL) agents are typically trained under fixed player counts (e.g., two players) and homogeneous partners, resulting in poor generalization to unseen collaborators or varying team sizes. This work introduces the first universal Hanabi agent capable of zero-shot adaptation to arbitrary team sizes (2–5 players) and heterogeneous algorithmic partners. Our approach features: (1) a unified, text-based task representation that dynamically aligns observation and action spaces across variable player counts; and (2) R3D2—a Recurrent Replay Relevance Distributed DQN framework integrating language understanding with distributed MARL to enable policy transfer and collaborative reasoning. Experiments demonstrate state-of-the-art performance across all Hanabi configurations and significantly outperform specialized baselines when collaborating with diverse, untrained partners. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Traditional multi-agent reinforcement learning (MARL) systems can develop cooperative strategies through repeated interactions. However, these systems are unable to perform well on any other setting than the one they have been trained on, and struggle to successfully cooperate with unfamiliar collaborators. This is particularly visible in the Hanabi benchmark, a popular 2-to-5 player cooperative card-game which requires complex reasoning and precise assistance to other agents. Current MARL agents for Hanabi can only learn one specific game-setting (e.g., 2-player games), and play with the same algorithmic agents. This is in stark contrast to humans, who can quickly adjust their strategies to work with unfamiliar partners or situations. In this paper, we introduce Recurrent Replay Relevance Distributed DQN (R3D2), a generalist agent for Hanabi, designed to overcome these limitations. We reformulate the task using text, as language has been shown to improve transfer. We then propose a distributed MARL algorithm that copes with the resulting dynamic observation- and action-space. In doing so, our agent is the first that can play all game settings concurrently, and extend strategies learned from one setting to other ones. As a consequence, our agent also demonstrates the ability to collaborate with different algorithmic agents -- agents that are themselves unable to do so. The implementation code is available at: $href{https://github.com/chandar-lab/R3D2-A-Generalist-Hanabi-Agent}{R3D2-A-Generalist-Hanabi-Agent}$
Problem

Research questions and friction points this paper is trying to address.

Develops a generalist agent for Hanabi
Overcomes limitations of traditional MARL systems
Enables collaboration with unfamiliar algorithmic agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recurrent Replay Relevance Distributed DQN (R3D2)
Text-based task reformulation for transfer learning
Distributed MARL algorithm for dynamic spaces
🔎 Similar Papers
No similar papers found.