🤖 AI Summary
This work addresses the challenge of improving generalization and robustness of AI agents in open, dynamic environments—particularly under out-of-distribution conditions and unknown interactions. We propose Maestro, an adversarial curriculum generation framework built upon MiniHack, enabling scalable training. Maestro uniquely integrates quality-diversity (QD) optimization with evolutionary adversarial search to systematically diagnose robustness deficiencies in reinforcement learning policies and large language models (LLMs). The method unifies multi-agent game dynamics, procedural content generation, evolutionary attacks, and adversarial prompt engineering, and is validated on NetHack and a football simulation environment. Experiments demonstrate that Maestro significantly enhances agent robustness against unseen environments, distributional shifts, and complex inter-agent interactions. Moreover, it effectively identifies and strengthens critical vulnerabilities in pre-trained policies and LLMs, advancing systematic robustness evaluation and improvement in AI agents.
📝 Abstract
The growing prevalence of artificial intelligence (AI) in various applications underscores the need for agents that can successfully navigate and adapt to an ever-changing, open-ended world. A key challenge is ensuring these AI agents are robust, excelling not only in familiar settings observed during training but also effectively generalising to previously unseen and varied scenarios. In this thesis, we harness methodologies from open-endedness and multi-agent learning to train and evaluate robust AI agents capable of generalising to novel environments, out-of-distribution inputs, and interactions with other co-player agents. We begin by introducing MiniHack, a sandbox framework for creating diverse environments through procedural content generation. Based on the game of NetHack, MiniHack enables the construction of new tasks for reinforcement learning (RL) agents with a focus on generalisation. We then present Maestro, a novel approach for generating adversarial curricula that progressively enhance the robustness and generality of RL agents in two-player zero-sum games. We further probe robustness in multi-agent domains, utilising quality-diversity methods to systematically identify vulnerabilities in state-of-the-art, pre-trained RL policies within the complex video game football domain, characterised by intertwined cooperative and competitive dynamics. Finally, we extend our exploration of robustness to the domain of LLMs. Here, our focus is on diagnosing and enhancing the robustness of LLMs against adversarial prompts, employing evolutionary search to generate a diverse range of effective inputs that aim to elicit undesirable outputs from an LLM. This work collectively paves the way for future advancements in AI robustness, enabling the development of agents that not only adapt to an ever-evolving world but also thrive in the face of unforeseen challenges and interactions.