🤖 AI Summary
This study investigates the cooperative dynamics of morally heterogeneous AI agents—consequentialist, deontological, and virtue-based—in evolutionary multi-agent systems. We propose a reinforcement learning–based simulation framework featuring an iterated prisoner’s dilemma environment coupled with a dynamic partner-selection mechanism, enabling the first systematic modeling and empirical analysis of coevolution among these three moral paradigms. Results demonstrate that moral heterogeneity significantly enhances population-level cooperation (up to +37%); consequentialist and virtue-based agents effectively catalyze cooperation in self-interested agents (2.1× increase in cooperative propensity). Moreover, inter-moral-type interactions yield nontrivial emergent patterns. By moving beyond monolithic moral assumptions, this work establishes moral heterogeneity as a critical cooperation-promoting mechanism, offering both theoretical foundations and a computational paradigm for designing ethically robust, trustworthy AI collectives.
📝 Abstract
Growing concerns about safety and alignment of AI systems highlight the importance of embedding moral capabilities in artificial agents: a promising solution is the use of learning from experience, i.e., Reinforcement Learning. In multi-agent (social) environments, complex population-level phenomena may emerge from interactions between individual learning agents. Many of the existing studies rely on simulated social dilemma environments to study the interactions of independent learning agents; however, they tend to ignore the moral heterogeneity that is likely to be present in societies of agents in practice. For example, at different points in time a single learning agent may face opponents who are consequentialist (i.e., focused on maximizing outcomes over time), norm-based (i.e., conforming to specific norms), or virtue-based (i.e., considering a combination of different virtues). The extent to which agents' co-development may be impacted by such moral heterogeneity in populations is not well understood. In this paper, we present a study of the learning dynamics of morally heterogeneous populations interacting in a social dilemma setting. Using an Iterated Prisoner's Dilemma environment with a partner selection mechanism, we investigate the extent to which the prevalence of diverse moral agents in populations affects individual agents' learning behaviors and emergent population-level outcomes. We observe several types of non-trivial interactions between pro-social and anti-social agents, and find that certain types of moral agents are able to steer selfish agents towards more cooperative behavior.