π€ AI Summary
This study addresses the robustness of multi-agent AI systems in open-world environments when confronted with unanticipated novelty. To rigorously evaluate adaptive capabilities, we introduce GNOMEβa novel assessment platform that strictly decouples agent policies from environment simulators, enabling dynamic injection of unforeseen, non-predefined events into the simulation. This design eliminates model selection bias and enables faithful evaluation of real-time adaptation. GNOME features a modular simulation architecture, integrates state-of-the-art multi-agent reinforcement learning algorithms, and provides an interactive Web GUI, using strategy board games (e.g., Monopoly) as standardized testbeds. As a core evaluation infrastructure for DARPAβs SAIL-ON program, GNOME has been deployed to assess multiple external research teams and was publicly demonstrated at NeurIPS 2020. It has catalyzed scholarly discourse on the nature of novelty and open-world adaptability in AI, establishing a principled benchmark for evaluating resilience to unanticipated environmental changes.
π Abstract
We describe GNOME (Generating Novelty in Open-world Multi-agent Environments), an experimental platform that is designed to test the effectiveness of multi-agent AI systems when faced with emph{novelty}. GNOME separates the development of AI gameplaying agents with the simulator, allowing emph{unanticipated} novelty (in essence, novelty that is not subject to model-selection bias). Using a Web GUI, GNOME was recently demonstrated at NeurIPS 2020 using the game of Monopoly to foster an open discussion on AI robustness and the nature of novelty in real-world environments. In this article, we further detail the key elements of the demonstration, and also provide an overview of the experimental design that is being currently used in the DARPA Science of Artificial Intelligence and Learning for Open-World Novelty (SAIL-ON) program to evaluate external teams developing novelty-adaptive gameplaying agents.