🤖 AI Summary
In open environments with heterogeneous agents, determining when to cooperate remains a significant challenge for autonomous decision-making. This work proposes a novel hierarchical policy learning framework that introduces the meta-decision of “whether to cooperate” into ad hoc teamwork settings. By integrating imitation learning with reinforcement learning and incorporating a teammate behavior prediction model, the approach enhances collaborative efficiency. Evaluated in two extended heterogeneous cooperative environments, the method substantially outperforms existing baselines and demonstrates the effectiveness of explicit teammate modeling under conditions of limited information.
📝 Abstract
A significant element of human cooperative intelligence lies in our ability to identify opportunities for fruitful collaboration; and conversely to recognise when the task at hand is better pursued alone. Research on flexible cooperation in machines has left this meta-level problem largely unexplored, despite its importance for successful collaboration in heterogeneous open-ended environments. Here, we extend the typical Ad Hoc Teamwork (AHT) setting to incorporate the idea of agents having heterogeneous goals that in any given scenario may or may not overlap. We introduce a novel approach to learning policies in this setting, based on a hierarchical combination of imitation and reinforcement learning, and show that it outperforms baseline methods across extended versions of two cooperative environments. We also investigate the contribution of an auxiliary component that learns to model teammates by predicting their actions, finding that its effect on performance is inversely related to the amount of observable information about teammate goals.