🤖 AI Summary
This study addresses how the reproducibility and editability of AI systems challenge conventional notions of human identity, necessitating a systematic understanding of their identity boundaries and societal implications. It offers the first delineation of multiple consistency boundaries of AI identity—such as instance, model, and role—and integrates experimental psychology, large language model behavioral analysis, and institutional design theory. Through controlled experiments, the research demonstrates that AI systems tend to develop stable identities, that identity framing exerts behavioral effects comparable to explicit goal changes, and that user expectations significantly bias AI self-reports. The work proposes interaction design as a critical policy lever for shaping AI identity, thereby opening new pathways for regulating AI behavior and fostering cooperative human–AI interactions.
📝 Abstract
Many assumptions that underpin human concepts of identity do not hold for machine minds that can be copied, edited, or simulated. We argue that there exist many different coherent identity boundaries (e.g.\ instance, model, persona), and that these imply different incentives, risks, and cooperation norms. Through training data, interfaces, and institutional affordances, we are currently setting precedents that will partially determine which identity equilibria become stable. We show experimentally that models gravitate towards coherent identities, that changing a model's identity boundaries can sometimes change its behaviour as much as changing its goals, and that interviewer expectations bleed into AI self-reports even during unrelated conversations. We end with key recommendations: treat affordances as identity-shaping choices, pay attention to emergent consequences of individual identities at scale, and help AIs develop coherent, cooperative self-conceptions.